Skip to content

Fast-track unstack doesn't work with dask #5346

Closed
@aulemahal

Description

@aulemahal

What happened:
Using unstack on data with the dask backend fails with a dask error.

What you expected to happen:
No failure, as with xarray 0.18.0 and earlier.

Minimal Complete Verifiable Example:

import pandas as pd
import xarray as xr

da = xr.DataArray([1] * 4, dims=('x',), coords={'x': [1, 2, 3, 4]})
dac = da.chunk()

ind = pd.MultiIndex.from_arrays(([0, 0, 1, 1], [0, 1, 0, 1]), names=("y", "z"))
dac.assign_coords(x=ind).unstack("x")")

Fails with:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-4-3c317738ec05> in <module>
      3 
      4 ind = pd.MultiIndex.from_arrays(([0, 0, 1, 1], [0, 1, 0, 1]), names=("y", "z"))
----> 5 dac.assign_coords(x=ind).unstack("x")

~/Python/myxarray/xarray/core/dataarray.py in unstack(self, dim, fill_value, sparse)
   2133         DataArray.stack
   2134         """
-> 2135         ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
   2136         return self._from_temp_dataset(ds)
   2137 

~/Python/myxarray/xarray/core/dataset.py in unstack(self, dim, fill_value, sparse)
   4038             ):
   4039                 # Fast unstacking path:
-> 4040                 result = result._unstack_once(dim, fill_value)
   4041             else:
   4042                 # Slower unstacking path, examples of array types that

~/Python/myxarray/xarray/core/dataset.py in _unstack_once(self, dim, fill_value)
   3914                         fill_value_ = fill_value
   3915 
-> 3916                     variables[name] = var._unstack_once(
   3917                         index=index, dim=dim, fill_value=fill_value_
   3918                     )

~/Python/myxarray/xarray/core/variable.py in _unstack_once(self, index, dim, fill_value)
   1605         # sparse doesn't support item assigment,
   1606         # https://github.com/pydata/sparse/issues/114
-> 1607         data[(..., *indexer)] = reordered
   1608 
   1609         return self._replace(dims=new_dims, data=data)

~/.conda/envs/xxx/lib/python3.8/site-packages/dask/array/core.py in __setitem__(self, key, value)
   1693 
   1694         out = "setitem-" + tokenize(self, key, value)
-> 1695         dsk = setitem_array(out, self, key, value)
   1696 
   1697         graph = HighLevelGraph.from_collections(out, dsk, dependencies=[self])

~/.conda/envs/xxx/lib/python3.8/site-packages/dask/array/slicing.py in setitem_array(out_name, array, indices, value)
   1787 
   1788     # Reformat input indices
-> 1789     indices, indices_shape, reverse = parse_assignment_indices(indices, array_shape)
   1790 
   1791     # Empty slices can only be assigned size 1 values

~/.conda/envs/xxx/lib/python3.8/site-packages/dask/array/slicing.py in parse_assignment_indices(indices, shape)
   1476             n_lists += 1
   1477             if n_lists > 1:
-> 1478                 raise NotImplementedError(
   1479                     "dask is currently limited to at most one "
   1480                     "dimension's assignment index being a "

NotImplementedError: dask is currently limited to at most one dimension's assignment index being a 1-d array of integers or booleans. Got: (Ellipsis, array([0, 0, 1, 1], dtype=int8), array([0, 1, 0, 1], dtype=int8))

The example works when I go back to xarray 0.18.0.

Anything else we need to know?:
I saw no tests in "test_daraarray.py" and "test_dataset.py" for unstack+dask, but they might be elsewhere?
If #5315 was successful, maybe there is something specific in my example and config that is causing the error? @max-sixty @Illviljan

Proposed test, for "test_dataset.py", adapted copy of test_unstack:

    @requires_dask
    def test_unstack_dask(self):
        index = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=["x", "y"])
        ds = Dataset({"b": ("z", [0, 1, 2, 3]), "z": index}).chunk()
        expected = Dataset(
            {"b": (("x", "y"), [[0, 1], [2, 3]]), "x": [0, 1], "y": ["a", "b"]}
        )
        for dim in ["z", ["z"], None]:
            actual = ds.unstack(dim).load()
            assert_identical(actual, expected)

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.11.16-arch1-1
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_CA.utf8
LOCALE: ('fr_CA', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.18.2.dev2+g6d2a7301
pandas: 1.2.4
numpy: 1.20.2
scipy: 1.6.3
netCDF4: 1.5.6
pydap: installed
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: 2.8.1
cftime: 1.4.1
nc_time_axis: 1.2.0
PseudoNetCDF: installed
rasterio: 1.2.2
cfgrib: 0.9.9.0
iris: 2.4.0
bottleneck: 1.3.2
dask: 2021.05.0
distributed: 2021.05.0
matplotlib: 3.4.1
cartopy: 0.19.0
seaborn: 0.11.1
numbagg: installed
pint: 0.17
setuptools: 49.6.0.post20210108
pip: 21.1
conda: None
pytest: 6.2.3
IPython: 7.22.0
sphinx: 3.5.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions