Internal refactor: create a generic function for applying ufuncs-like functions to xarray objects

It would be awesome to have a generic function for making functions that act like NumPy's generalized universal functions "xarray aware".

What would `xarray.apply_ufunc(func, objs, join='inner', agg_dims=None, drop_dims=None, kwargs=None)` do?
1. If one or more of the provided objects are `Dataset` or `GroupBy` instances, dispatch to specialized loops that call the remainder of `apply_ufunc` repeatedly.
2. `align` all objects along shared labels using the indicated `join` (for some operations, e.g., `where`, a left join is appropriate rather than an inner join).
3. `broadcast` all objects against each other to expand dimensionality along all dimensions except (optionally) those listed in `agg_dims`/`drop_dims`. `drop_dims` should be moved to the end, for consistency with gufunc signatures.
4. Transform `agg_dims` (if provided) into an `axis` argument using `get_axis_num` and insert it into `kwargs`.
5. Apply `func` to the `data` argument of each array to calculate the result using the provided `kwargs`. The result is expected to have all the same dimensions in the provided arrays, except any listed in the `dims` and `drop_dims` arguments.
6. `merge` all coordinate data together (i.e., with an n-ary version of the `Coordinate.merge` method) and add these to the result array.

If any of `args` are not xarray objects (e.g., they're NumPy or dask arrays), they should be skipped in operations that don't apply to them. `xarray.Variable` don't align or have coordinates, for example.

A concrete example of similar functionality in dask.array is [`atop`](https://github.com/dask/dask/blob/0.8.0/dask/array/core.py#L1572). The most similar thing to this that we currently have in xarray are the `_unary_op` and `_binary_op` staticmethods (e.g., [on DataArray](https://github.com/pydata/xarray/blob/v0.7.1/xarray/core/dataarray.py#L1180)), but these only handle one or two arguments, don't handle aggregated dimensions and most importantly, are difficult to apply to new operations.

Here are a few concrete examples of how this could work:

``` python
def average(array, weights, dim=None):
    # still needs a bit of work to make a NaN and dask.array safe version
    # version of np.average 
    return apply_ufunc(np.average, [array, weights], agg_dims=dim)

def where(cond, first, second=None):
    if second is None:
        # need to write where2, a function that looks at first.dtype
        # to infer the appropriate NA sentinel value
        return apply_ufunc(ops.where2, [cond, first])
    else:
        return apply_ufunc(ops.where, [cond, first, second])

def dot(self, other, dim=None):
    if dim is None:
        dim = set(self.dims) ^ set(other.dims)
    return apply_ufunc(ops.tensordot, [self, other], agg_dims=dim)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Internal refactor: create a generic function for applying ufuncs-like functions to xarray objects #770

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Internal refactor: create a generic function for applying ufuncs-like functions to xarray objects #770

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions