Skip to content

implement Gradient #2398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Sep 21, 2018
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Top-level functions
zeros_like
ones_like
dot
gradient

Dataset
=======
Expand Down Expand Up @@ -150,6 +151,7 @@ Computation
Dataset.resample
Dataset.diff
Dataset.quantile
Dataset.gradient

**Aggregation**:
:py:attr:`~Dataset.all`
Expand Down Expand Up @@ -317,6 +319,7 @@ Computation
DataArray.diff
DataArray.dot
DataArray.quantile
DataArray.gradient

**Aggregation**:
:py:attr:`~DataArray.all`
Expand Down
7 changes: 6 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@ Documentation
Enhancements
~~~~~~~~~~~~

- :py:func:`~xarray.gradient`, :py:meth:`~xarray.DataArray.gradient`, and
:py:meth:`~xarray.Dataset.gradient` are newly added.
(:issue:`1332`)
By `Keisuke Fujii <https://github.com/fujiisoup>`_.

- min_count option is newly supported in :py:meth:`~xarray.DataArray.sum`,
:py:meth:`~xarray.DataArray.prod` and :py:meth:`~xarray.Dataset.sum`, and
:py:meth:`~xarray.Dataset.prod`.
Expand Down Expand Up @@ -84,7 +89,7 @@ Bug fixes
- Tests can be run in parallel with pytest-xdist
By `Tony Tung <https://github.com/ttung>`_.

- Follow up the renamings in dask; from dask.ghost to dask.overlap
- Follow up the renamings in dask; from dask.ghost to dask.overlap
By `Keisuke Fujii <https://github.com/fujiisoup>`_.

- Now raises a ValueError when there is a conflict between dimension names and
Expand Down
2 changes: 1 addition & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from .core.alignment import align, broadcast, broadcast_arrays
from .core.common import full_like, zeros_like, ones_like
from .core.combine import concat, auto_combine
from .core.computation import apply_ufunc, where, dot
from .core.computation import apply_ufunc, dot, gradient, where
from .core.extensions import (register_dataarray_accessor,
register_dataset_accessor)
from .core.variable import as_variable, Variable, IndexVariable, Coordinate
Expand Down
106 changes: 106 additions & 0 deletions xarray/core/computation.py
Original file line number Diff line number Diff line change
Expand Up @@ -1137,3 +1137,109 @@ def where(cond, x, y):
join='exact',
dataset_join='exact',
dask='allowed')


def _gradient_once(variable, coord, edge_order):
""" Compute the gradient along 1 dimension for Variable.
variable, coord: Variable
"""
from .variable import Variable

result_array = duck_array_ops.gradient(
variable.data, coord.data, edge_order=edge_order,
axis=variable.get_axis_num(coord.dims[0]))
return Variable(variable.dims, result_array)


def gradient(dataarray, coords, edge_order=1):
""" Compute the gradient of the dataarray with the second order accurate
central differences.

Parameters
----------
dataarray: xr.DataArray
Target array
coords: str or sequence of strs
The coordinate to be used to compute the gradient.
edge_order: 1 or 2. Default 1
N-th order accurate differences at the boundaries.

Returns
-------
gradient: DataArray or a sequence of DataArrays

See also
--------
numpy.gradient: corresponding numpy function

Examples
--------

>>> da = xr.DataArray(np.arange(12).reshape(4, 3), dims=['x', 'y'],
... coords={'x': [0, 0.1, 1.1, 1.2]})
>>> da
<xarray.DataArray (x: 4, y: 3)>
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y
>>>
>>> xr.gradient(da, 'x')
<xarray.DataArray (x: 4, y: 3)>
array([[30. , 30. , 30. ],
[27.545455, 27.545455, 27.545455],
[27.545455, 27.545455, 27.545455],
[30. , 30. , 30. ]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y
>>>
>>> xr.gradient(da, ('x', 'y'))
Copy link
Member Author

@fujiisoup fujiisoup Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Altough this API is similar to numpy's counterpart, I'm wondering whether we really need to support this. The return value is a tuple of DataArrays, which I feel inconsistent to other xarray functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote my own wrapper for gradient of a DataArray, I returned a dataset with each variable being a gradient along one dimension, but even that isn't very elegant ¯_(ツ)_/¯

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be another option, but in order to store them to a dataset, we will need to define their names very heuristically...

Maybe we can drop this api as this does not speed up the computation and it's easy to do the same thing manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I don't think it's particularly useful to compute to compute independent gradients with respect to multiple axes at the same time, e.g., du/dx, du/dy, etc. If anything, I would be more excited about computing higher order derivatives, e.g., \frac{d^2 u}{dx dy}

(<xarray.DataArray (x: 4, y: 3)>
array([[30. , 30. , 30. ],
[27.545455, 27.545455, 27.545455],
[27.545455, 27.545455, 27.545455],
[30. , 30. , 30. ]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y, <xarray.DataArray (x: 4, y: 3)>
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y)
"""
from .dataarray import DataArray

if not isinstance(dataarray, DataArray):
raise TypeError(
'Only DataArray is supported. Given {}.'.format(type(dataarray)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused here. You wrote an implementation for Dataset.differentiate, too, and here is a duplicate version of DataArray.differentiate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Maybe the top level function is not necessary here and DataArray.differentiate and Dataset.differentiate would be sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I don't think we need the separate function.


return_sequence = True
if not isinstance(coords, (tuple, list)):
coords = (coords, )
return_sequence = False

result = []
for coord in coords:
if coord not in dataarray.coords and coord not in dataarray.dims:
raise ValueError('Coordiante {} does not exist.'.format(coord))

coord_var = dataarray[coord].variable
if coord_var.ndim != 1:
raise ValueError(
'Only 1d-coordinate is supported. {} is {} '
'dimensional.'.format(coord, dataarray[coord].ndim))

result.append(DataArray(
_gradient_once(dataarray.variable, coord_var, edge_order),
coords=dataarray.coords))

if return_sequence:
return tuple(result)
return result[0]
35 changes: 35 additions & 0 deletions xarray/core/dask_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,38 @@ def func(x, window, axis=-1):
index = (slice(None),) * axis + (slice(drop_size,
drop_size + orig_shape[axis]), )
return out[index]


def gradient(a, coord, axis, edge_order):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be implemented in the upstream...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like this? Requires Dask 0.17.3+ though. So IDK if that works for you or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakirkahm.
I need to compute the gradient with a non uniform coordinate, but dask Array.gradient only supports uniformly spacing coordinate.

I think this can be implemented using overlap as I do here. Is it in the scope of the dask development?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry. Should have read more of the code.

Yeah I think that is in scope. Just not currently supported yet.

Could you please open an issue or perhaps PR over on the Dask repo?

""" Apply gradient along one dimension """
if axis < 0:
axis = a.ndim + axis

for c in a.chunks[axis]:
if c < edge_order + 1:
raise ValueError('Chunk size must be larger than edge_order + 1. '
'Given {}. Rechunk to proceed.'.format(c))

depth = {d: 1 if d == axis else 0 for d in range(a.ndim)}
# temporary pad zero at the boundary
boundary = "none"
ag = overlap(a, depth=depth, boundary=boundary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F841 local variable 'ag' is assigned to but never used


n_chunk = len(a.chunks[axis])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F841 local variable 'n_chunk' is assigned to but never used

# adjust the coordinate position with overlap
array_loc_stop = np.cumsum(np.array(a.chunks[axis])) + 1
array_loc_start = array_loc_stop - np.array(a.chunks[axis]) - 2
array_loc_stop[-1] -= 1
array_loc_start[0] = 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W293 blank line contains whitespace

def func(x, block_id):
block_loc = block_id[axis]
c = coord[array_loc_start[block_loc]: array_loc_stop[block_loc]]
grad = np.gradient(x, c, axis=axis, edge_order=edge_order)
return grad

return a.map_overlap(
func,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E126 continuation line over-indented for hanging indent

dtype=a.dtype,
depth={j: 1 if j == axis else 0 for j in range(a.ndim)},
boundary="none")
48 changes: 48 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -2267,6 +2267,54 @@ def rank(self, dim, pct=False, keep_attrs=False):
ds = self._to_temp_dataset().rank(dim, pct=pct, keep_attrs=keep_attrs)
return self._from_temp_dataset(ds)

def gradient(self, coord, edge_order=1):
""" Compute the gradient with the second order accurate central
differences.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here


Parameters
----------
coords: str
The coordinate to be used to compute the gradient.
edge_order: 1 or 2. Default 1
N-th order accurate differences at the boundaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs time_unit here

Returns
-------
gradient: DataArray

See also
--------
numpy.gradient: corresponding numpy function
xr.gradient: more numpy-like function for xarray object.

Examples
--------

>>> da = xr.DataArray(np.arange(12).reshape(4, 3), dims=['x', 'y'],
... coords={'x': [0, 0.1, 1.1, 1.2]})
>>> da
<xarray.DataArray (x: 4, y: 3)>
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y
>>>
>>> da.gradient('x')
<xarray.DataArray (x: 4, y: 3)>
array([[30. , 30. , 30. ],
[27.545455, 27.545455, 27.545455],
[27.545455, 27.545455, 27.545455],
[30. , 30. , 30. ]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y
"""
ds = self._to_temp_dataset().gradient(coord, edge_order)
return self._from_temp_dataset(ds)


# priority most be higher than Variable to properly work with binary ufuncs
ops.inject_all_ops_and_reduce_methods(DataArray, priority=60)
44 changes: 41 additions & 3 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
import xarray as xr

from . import (
alignment, duck_array_ops, formatting, groupby, indexing, ops, resample,
rolling, utils)
alignment, computation, duck_array_ops, formatting, groupby, indexing, ops,
resample, rolling, utils)
from .. import conventions
from .alignment import align
from .common import (
Expand Down Expand Up @@ -3645,6 +3645,44 @@ def rank(self, dim, pct=False, keep_attrs=False):
attrs = self.attrs if keep_attrs else None
return self._replace_vars_and_dims(variables, coord_names, attrs=attrs)

def gradient(self, coord, edge_order=1):
""" Compute the gradient with the second order accurate central
differences.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here


Parameters
----------
coords: str
The coordinate to be used to compute the gradient.
edge_order: 1 or 2. Default 1
N-th order accurate differences at the boundaries.

Returns
-------
gradient: Dataset

See also
--------
numpy.gradient: corresponding numpy function
xr.gradient: more numpy-like function for xarray object.
"""
if coord not in self.variables and coord not in self.dims:
raise ValueError('Coordinate {} does not exist.'.format(coord))

coord_var = self[coord].variable
if coord_var.ndim != 1:
raise ValueError('Coordinate {} must be 1 dimensional but {} is {}'
' dimensional'.format(coord, coord_var.ndims))

dim = coord_var.dims[0]
variables = OrderedDict()
for k, v in self.variables.items():
if k in self.data_vars and dim in v.dims:
variables[k] = computation._gradient_once(
v, coord_var, edge_order)
else:
variables[k] = v
return self._replace_vars_and_dims(variables)

@property
def real(self):
return self._unary_op(lambda x: x.real, keep_attrs=True)(self)
Expand All @@ -3658,7 +3696,7 @@ def filter_by_attrs(self, **kwargs):

Can pass in ``key=value`` or ``key=callable``. A Dataset is returned
containing only the variables for which all the filter tests pass.
These tests are either ``key=value`` for which the attribute ``key``
These tests are either ``key=value`` for which the attribute ``key``
has the exact value ``value`` or the callable passed into
``key=callable`` returns True. The callable will be passed a single
value, either the value of the attribute ``key`` or ``None`` if the
Expand Down
8 changes: 8 additions & 0 deletions xarray/core/duck_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,14 @@ def isnull(data):
einsum = _dask_or_eager_func('einsum', array_args=slice(1, None),
requires_dask='0.17.3')


def gradient(x, coord, axis, edge_order):
if isinstance(x, dask_array_type):
return dask_array_ops.gradient(
x, coord, axis=axis, edge_order=edge_order)
return npcompat.gradient(x, coord, axis=axis, edge_order=edge_order)


masked_invalid = _dask_or_eager_func(
'masked_invalid', eager_module=np.ma,
dask_module=getattr(dask_array, 'ma', None))
Expand Down
Loading