Description
It would be awesome to have a generic function for making functions that act like NumPy's generalized universal functions "xarray aware".
What would xarray.apply_ufunc(func, objs, join='inner', agg_dims=None, drop_dims=None, kwargs=None)
do?
- If one or more of the provided objects are
Dataset
orGroupBy
instances, dispatch to specialized loops that call the remainder ofapply_ufunc
repeatedly. align
all objects along shared labels using the indicatedjoin
(for some operations, e.g.,where
, a left join is appropriate rather than an inner join).broadcast
all objects against each other to expand dimensionality along all dimensions except (optionally) those listed inagg_dims
/drop_dims
.drop_dims
should be moved to the end, for consistency with gufunc signatures.- Transform
agg_dims
(if provided) into anaxis
argument usingget_axis_num
and insert it intokwargs
. - Apply
func
to thedata
argument of each array to calculate the result using the providedkwargs
. The result is expected to have all the same dimensions in the provided arrays, except any listed in thedims
anddrop_dims
arguments. merge
all coordinate data together (i.e., with an n-ary version of theCoordinate.merge
method) and add these to the result array.
If any of args
are not xarray objects (e.g., they're NumPy or dask arrays), they should be skipped in operations that don't apply to them. xarray.Variable
don't align or have coordinates, for example.
A concrete example of similar functionality in dask.array is atop
. The most similar thing to this that we currently have in xarray are the _unary_op
and _binary_op
staticmethods (e.g., on DataArray), but these only handle one or two arguments, don't handle aggregated dimensions and most importantly, are difficult to apply to new operations.
Here are a few concrete examples of how this could work:
def average(array, weights, dim=None):
# still needs a bit of work to make a NaN and dask.array safe version
# version of np.average
return apply_ufunc(np.average, [array, weights], agg_dims=dim)
def where(cond, first, second=None):
if second is None:
# need to write where2, a function that looks at first.dtype
# to infer the appropriate NA sentinel value
return apply_ufunc(ops.where2, [cond, first])
else:
return apply_ufunc(ops.where, [cond, first, second])
def dot(self, other, dim=None):
if dim is None:
dim = set(self.dims) ^ set(other.dims)
return apply_ufunc(ops.tensordot, [self, other], agg_dims=dim)