Skip to content

Unnecessary copy when indexing to obtain a 0d array #2622

Closed
@danielwe

Description

@danielwe

Code Sample

>>> import numpy as np
>>> import xarray as xr
>>> da = xr.DataArray(np.arange(3))
>>> da
<xarray.DataArray (dim_0: 3)>
array([0, 1, 2])
Dimensions without coordinates: dim_0
>>> da[0].values.fill(99)
>>> da
<xarray.DataArray (dim_0: 3)>
array([0, 1, 2])
Dimensions without coordinates: dim_0

Problem description

Indexing into xarray objects creates a view of the underlying data if possible. A surprising exception is when all dimensions are indexed out and the resulting object is 0d. Xarray insists on returning a 0d array rather than a scalar, which suggests (at least to me) that this is also a view whenever possible; however, it is always a copy, and modifying it will never affect the original array.

(The example above is a little contrived, since one could always call da[0] = 99. In my actual use case I am indexing into a Dataset in a way that creates views for all variables except the one that happens to collapse to 0d, and thus I'm unable to use the indexed Dataset to modify that variable in the original Dataset.)

The copy happens because, internally, the 0d array is created by retrieving a scalar from the underlying numpy array and then wrapping a new array around it. However, in numpy a 0d view can be created directly by indexing with Ellipsis/..., as follows:

>>> import numpy as np
>>> arr = np.arange(3)
>>> arr[0, ...]
array(0)

Thus, a fix that solves my immediate issues and passes all current tests is to modify the following method:

xarray/xarray/core/indexing.py

Lines 1154 to 1163 in 778ffc4

def _indexing_array_and_key(self, key):
if isinstance(key, OuterIndexer):
array = self.array
key = _outer_to_numpy_indexer(key, self.array.shape)
elif isinstance(key, VectorizedIndexer):
array = nputils.NumpyVIndexAdapter(self.array)
key = key.tuple
elif isinstance(key, BasicIndexer):
array = self.array
key = key.tuple

to always append an ellipsis for basic and outer indexing:

    def _indexing_array_and_key(self, key):
        if isinstance(key, OuterIndexer):
            array = self.array
>           key = _outer_to_numpy_indexer(key, self.array.shape) + (Ellipsis,)
        elif isinstance(key, VectorizedIndexer):
            array = nputils.NumpyVIndexAdapter(self.array)
            key = key.tuple
        elif isinstance(key, BasicIndexer):
            array = self.array
>           key = key.tuple + (Ellipsis,)

I'm not familiar enough with all the indexing variants in xarray to know if this covers all cases of 0d arrays that are currently copies but could be views. If someone wants to share some insight (e.g., some more advanced test cases), I could try and put together a pull request.

Expected Output

>>> da[0].values.fill(99)
>>> da
<xarray.DataArray (dim_0: 3)>
array([99, 1, 2])
Dimensions without coordinates: dim_0

Output of xr.show_versions()

/home/daniel/local/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-42-lowlatency
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.11.0
pandas: 0.23.0
numpy: 1.14.3
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: 0.6.2
h5py: 2.7.1
Nio: None
zarr: None
cftime: 1.0.0b1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.2.2
cartopy: None
seaborn: 0.8.1
setuptools: 39.1.0
pip: 10.0.1
conda: 4.5.12
pytest: 3.5.1
IPython: 6.4.0
sphinx: 1.7.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions