-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Duck array documentation improvements #7911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 20 commits
0217fe3
5a221bb
1971da4
fa58fff
258dd54
b26e7ac
ad81811
99394a3
c93f143
b6279fd
0eea00b
cc4fac0
5e8015f
70bfda5
5fdb7e3
68315f8
2931b86
9f21b00
2bb65d5
f0ba66c
0b405a1
ed6195c
40eb53b
a567aa4
76237a9
1923d4b
b26cbd8
f62b4a9
8d4bd3f
e9287de
d1e9b8f
d545d5d
1ea2078
be919b6
0c0a547
d03e125
90a8bcb
14057b9
45000e4
da8719d
08c0f84
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
|
||
.. _internals.accessors: | ||
|
||
Extending xarray using accessors | ||
================================ | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,170 @@ | ||
.. currentmodule:: xarray | ||
|
||
.. _userguide.duckarrays: | ||
|
||
Working with numpy-like arrays | ||
============================== | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
NumPy-like arrays (often known as :term:`duck array`\s) are drop-in replacements for the :py:class:`numpy.ndarray` | ||
class but with different features, such as propagating physical units or a different layout in memory. | ||
Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the | ||
additional features of these array libraries. | ||
|
||
Some numpy-like array types that xarray already has some support for: | ||
|
||
* `Cupy <https://cupy.dev/>`_ - GPU support, | ||
* `Sparse <https://sparse.pydata.org/en/stable/>`_ - for performant arrays with many zero elements, | ||
* `Pint <https://pint.readthedocs.io/en/latest/>`_ - for tracking the physical units of your data. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's also link to cupy-xarray and pint-xarray. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's also add dask (and cubed)? to the list! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was actually going to do a follow-up PR adding internals documentation on wrapping other chunked arrays, but I'll mention dask and cubed here too. |
||
|
||
.. warning:: | ||
|
||
This feature should be considered experimental. Please report any bug you may find on | ||
xarray’s github repository. | ||
This feature should be considered somewhat experimental. Please report any bugs you find on | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`xarray’s issue tracker <https://github.com/pydata/xarray/issues>`_. | ||
|
||
.. note:: | ||
|
||
For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that | ||
described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require | ||
slightly different user code (e.g. calling ``.chunk`` or ``.compute``). | ||
|
||
Why "duck"? | ||
----------- | ||
|
||
Why is it also called a "duck" array? This comes from a common statement of object-oriented programming - | ||
"If it walks like a duck, and quacks like a duck, treat it like a duck". In other words, a library like xarray that | ||
is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is | ||
permitted (e.g. ``if dask``, ``if numpy``, ``if sparse`` etc.). Instead xarray can take the more permissive approach of simply | ||
treating the wrapped array as valid, attempting to call the relevant methods (e.g. ``.mean()``) and only raising an | ||
error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows | ||
objects and classes from different libraries to work together more easily. | ||
|
||
What is a numpy-like array? | ||
--------------------------- | ||
|
||
A "numpy-like array" (also known as a "duck array") is a class that contains array-like data, and implements key | ||
numpy-like functionality such as indexing, broadcasting, and computation methods. | ||
|
||
For example, the `sparse <https://sparse.pydata.org/en/stable/>`_ library provides a sparse array type which is useful for representing nD array objects like sparse matrices | ||
in a memory-efficient manner. We can create a sparse array object (of the ``sparse.COO`` type) from a numpy array like this: | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. ipython:: python | ||
|
||
from sparse import COO | ||
|
||
x = np.eye(4, dtype=np.uint8) # create diagonal identity matrix | ||
s = COO.from_numpy(x) | ||
s | ||
|
||
This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements. | ||
This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices). | ||
Sparse array objects can be converted back to a "dense" numpy array by calling ``.todense``. | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Just like `numpy.ndarray` objects, `sparse.COO` arrays support indexing | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. ipython:: python | ||
|
||
s[1, 1] # diagonal elements should be ones | ||
s[2, 3] # off-diagonal elements should be zero | ||
|
||
broadcasting, | ||
|
||
.. ipython:: python | ||
|
||
x2 = np.zeros( | ||
(4, 1), dtype=np.uint8 | ||
) # create second sparse array of different shape | ||
s2 = COO.from_numpy(x2) | ||
(s * s2) # multiplication requires broadcasting | ||
|
||
and various computation methods | ||
|
||
.. ipython:: python | ||
|
||
s.sum(axis=1) | ||
|
||
This numpy-like array also supports calling so-called numpy ufuncs (link to numpy docs) on it directly: | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. ipython:: python | ||
|
||
np.sum(s, axis=1) | ||
|
||
NumPy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with | ||
additional features, like propagating physical units or a different layout in memory. | ||
|
||
:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as | ||
long as they satisfy certain conditions (see :ref:`internals.duck_arrays`). | ||
Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the | ||
equivalent numpy array - this is the sense in which the sparse array is "numpy-like". | ||
|
||
.. note:: | ||
|
||
For ``dask`` support see :ref:`dask`. | ||
For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duckarrays`. | ||
|
||
Wrapping numpy-like arrays in xarray | ||
------------------------------------ | ||
|
||
Missing features | ||
---------------- | ||
Most of the API does support :term:`duck array` objects, but there are a few areas where | ||
the code will still cast to ``numpy`` arrays: | ||
:py:class:`DataArray` and :py:class:`Dataset` (and :py:class:`Variable`) objects can wrap these numpy-like arrays. | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- dimension coordinates, and thus all indexing operations: | ||
Constructing xarray objects which wrap numpy-like arrays | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The primary way to create an xarray object which wraps a numpy-like array is to pass that numpy-like array instance directly | ||
to the constructor of the xarray class. The :ref:`page on xarray data structures <data structures>` shows how :py:class:`DataArray` and :py:class:`Dataset` | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
both accept data in various forms through their ``data`` argument, but in fact this data can also be any wrappable numpy-like array. | ||
|
||
For example, we can wrap the sparse array we created earlier inside a new DataArray object: | ||
|
||
.. ipython:: python | ||
|
||
s_da = xr.DataArray(s, dims=["i", "j"]) | ||
s_da | ||
|
||
We can see what's inside - the printable representation of our xarray object (the repr) automatically uses the printable | ||
representation of the underlying wrapped array. | ||
|
||
Of course our sparse array object is still there underneath - it's stored under the `.data` attribute of the dataarray: | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. ipython:: python | ||
|
||
s_da.data | ||
|
||
Array methods | ||
~~~~~~~~~~~~~ | ||
|
||
We saw above that numpy-like arrays provide numpy methods. Xarray automatically uses these when you call the corresponding xarray method: | ||
|
||
.. ipython:: python | ||
|
||
s_da.sum(dim="j") | ||
|
||
Converting wrapped types | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
If you want to change the type inside your xarray object you can use :py:meth:`DataArray.as_numpy`: | ||
|
||
.. ipython:: python | ||
|
||
s_da.as_numpy() | ||
|
||
This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array. | ||
|
||
If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or | ||
:py:meth:`DataArray.values`, where the former is strongly preferred. (The difference is in the way they coerce to numpy - `.values` | ||
always uses `np.asarray` which will fail for some array types (e.g. ``cupy``, whereas `to_numpy` uses the correct method | ||
depending on the array type.) | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This illustrates the difference between ``.data`` and ``.values``, which is sometimes a point of confusion for new xarray users. | ||
Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas | ||
:py:meth:`DataArray.values` converts the underlying array to a numpy array before returning it. | ||
(This is another reason to use ``.to_numpy`` over ``.values`` - the intention is clearer.) | ||
|
||
Conversion to numpy as a fallback | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
If a wrapped array does not implement the corresponding array method then xarray will often attempt to convert the | ||
underlying array to a numpy array so that the operation can be performed. You may want to watch out for this behavior, | ||
and report any instances in which it causes problems. | ||
|
||
Most of xarray's API does support using :term:`duck array` objects, but there are a few areas where | ||
the code will still convert to ``numpy`` arrays: | ||
|
||
- Dimension coordinates, and thus all indexing operations: | ||
|
||
* :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel` | ||
* :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc` | ||
|
@@ -33,7 +173,7 @@ the code will still cast to ``numpy`` arrays: | |
:py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in | ||
data variables and non-dimension coordinates won't be casted | ||
|
||
- functions and methods that depend on external libraries or features of ``numpy`` not | ||
- Functions and methods that depend on external libraries or features of ``numpy`` not | ||
covered by ``__array_function__`` / ``__array_ufunc__``: | ||
|
||
* :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``) | ||
|
@@ -49,17 +189,25 @@ the code will still cast to ``numpy`` arrays: | |
:py:class:`numpy.vectorize`) | ||
* :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`) | ||
|
||
- incompatibilities between different :term:`duck array` libraries: | ||
- Incompatibilities between different :term:`duck array` libraries: | ||
|
||
* :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was | ||
not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should | ||
wrap the new ``dask`` array; changing the chunk sizes works. | ||
|
||
wrap the new ``dask`` array; changing the chunk sizes works however. | ||
|
||
Extensions using duck arrays | ||
---------------------------- | ||
Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays | ||
easier: | ||
|
||
Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also | ||
makes sense to use an interfacing package to make certain tasks easier. | ||
|
||
For example the `pint-xarray package <https://pint-xarray.readthedocs.io>`_ offers a custom ``.pint`` accessor (see :ref:`internals.accessors`) which provides | ||
convenient access to information stored within the wrapped array (e.g. ``.units`` and ``.magnitude``), and makes makes | ||
creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user. | ||
|
||
We maintain a list of libraries extending ``xarray`` to make working with particular wrapped duck arrays | ||
easier. If you know of more that aren't on this list please raise an issue to add them! | ||
|
||
- `pint-xarray <https://pint-xarray.readthedocs.io>`_ | ||
- `cupy-xarray <https://cupy-xarray.readthedocs.io>`_ | ||
- `cubed-xarray <https://github.com/cubed-xarray>`_ |
Uh oh!
There was an error while loading. Please reload this page.