Skip to content

as_numpy changes MultiIndex  #8001

Open
@cgahr

Description

@cgahr

What happened?

I have a DataArray with a MultiIndex. In some cases, I use dask, so I call .as_numpy() to compute the array and store it in memory. I would expect that this call does NOT change the MultiIndex.

This is the case for .persist(), however, it's not the case for .as_numpy().

In the following MWE the original coordinates are:

Coordinates:
  * z        (z) int64 0 1 2 3 4
  * r        (r) object MultiIndex
  * x        (r) int64 0 0 0 1 1 1
  * y        (r) int64 0 1 2 0 1 2

After .persist() we get the same result. After .as_numpy() we get

Coordinates:
  * z        (z) int64 0 1 2 3 4
  * r        (r) object (0, 0) (0, 1) (0, 2) (1, 0) (1, 1) (1, 2)
  * x        (r) int64 0 0 0 1 1 1
  * y        (r) int64 0 1 2 0 1 2

which is not the same and can lead to issues down the line.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import xarray as xr

da = xr.DataArray(1, dims=('x', 'y', 'z'), coords={'x': [0, 1], 'y': [0, 1, 2], 'z': [0, 1, 2, 3, 4]})
da = da.stack(r=('x', 'y'))


xr.testing.assert_equal(da['r'], da.persist()['r'])   # ok
xr.testing.assert_equal(da['r'], da.as_numpy()['r'])  # err

xr.testing.assert_equal(da, da.persist())   # ok
xr.testing.assert_equal(da, da.as_numpy())  # ok but should err

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 4.12.14-122.162-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.0 libnetcdf: None

xarray: 2023.6.0
pandas: 1.5.3
numpy: 1.25.0
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: 1.2.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.4.1
distributed: 2023.4.1
matplotlib: 3.7.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.4.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.4.0
mypy: 1.4.1
IPython: 8.13.2
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions