Description
What happened:
Using unstack
on data with the dask backend fails with a dask error.
What you expected to happen:
No failure, as with xarray 0.18.0 and earlier.
Minimal Complete Verifiable Example:
import pandas as pd
import xarray as xr
da = xr.DataArray([1] * 4, dims=('x',), coords={'x': [1, 2, 3, 4]})
dac = da.chunk()
ind = pd.MultiIndex.from_arrays(([0, 0, 1, 1], [0, 1, 0, 1]), names=("y", "z"))
dac.assign_coords(x=ind).unstack("x")")
Fails with:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-4-3c317738ec05> in <module>
3
4 ind = pd.MultiIndex.from_arrays(([0, 0, 1, 1], [0, 1, 0, 1]), names=("y", "z"))
----> 5 dac.assign_coords(x=ind).unstack("x")
~/Python/myxarray/xarray/core/dataarray.py in unstack(self, dim, fill_value, sparse)
2133 DataArray.stack
2134 """
-> 2135 ds = self._to_temp_dataset().unstack(dim, fill_value, sparse)
2136 return self._from_temp_dataset(ds)
2137
~/Python/myxarray/xarray/core/dataset.py in unstack(self, dim, fill_value, sparse)
4038 ):
4039 # Fast unstacking path:
-> 4040 result = result._unstack_once(dim, fill_value)
4041 else:
4042 # Slower unstacking path, examples of array types that
~/Python/myxarray/xarray/core/dataset.py in _unstack_once(self, dim, fill_value)
3914 fill_value_ = fill_value
3915
-> 3916 variables[name] = var._unstack_once(
3917 index=index, dim=dim, fill_value=fill_value_
3918 )
~/Python/myxarray/xarray/core/variable.py in _unstack_once(self, index, dim, fill_value)
1605 # sparse doesn't support item assigment,
1606 # https://github.com/pydata/sparse/issues/114
-> 1607 data[(..., *indexer)] = reordered
1608
1609 return self._replace(dims=new_dims, data=data)
~/.conda/envs/xxx/lib/python3.8/site-packages/dask/array/core.py in __setitem__(self, key, value)
1693
1694 out = "setitem-" + tokenize(self, key, value)
-> 1695 dsk = setitem_array(out, self, key, value)
1696
1697 graph = HighLevelGraph.from_collections(out, dsk, dependencies=[self])
~/.conda/envs/xxx/lib/python3.8/site-packages/dask/array/slicing.py in setitem_array(out_name, array, indices, value)
1787
1788 # Reformat input indices
-> 1789 indices, indices_shape, reverse = parse_assignment_indices(indices, array_shape)
1790
1791 # Empty slices can only be assigned size 1 values
~/.conda/envs/xxx/lib/python3.8/site-packages/dask/array/slicing.py in parse_assignment_indices(indices, shape)
1476 n_lists += 1
1477 if n_lists > 1:
-> 1478 raise NotImplementedError(
1479 "dask is currently limited to at most one "
1480 "dimension's assignment index being a "
NotImplementedError: dask is currently limited to at most one dimension's assignment index being a 1-d array of integers or booleans. Got: (Ellipsis, array([0, 0, 1, 1], dtype=int8), array([0, 1, 0, 1], dtype=int8))
The example works when I go back to xarray 0.18.0.
Anything else we need to know?:
I saw no tests in "test_daraarray.py" and "test_dataset.py" for unstack+dask, but they might be elsewhere?
If #5315 was successful, maybe there is something specific in my example and config that is causing the error? @max-sixty @Illviljan
Proposed test, for "test_dataset.py", adapted copy of test_unstack
:
@requires_dask
def test_unstack_dask(self):
index = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=["x", "y"])
ds = Dataset({"b": ("z", [0, 1, 2, 3]), "z": index}).chunk()
expected = Dataset(
{"b": (("x", "y"), [[0, 1], [2, 3]]), "x": [0, 1], "y": ["a", "b"]}
)
for dim in ["z", ["z"], None]:
actual = ds.unstack(dim).load()
assert_identical(actual, expected)
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.11.16-arch1-1
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_CA.utf8
LOCALE: ('fr_CA', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.18.2.dev2+g6d2a7301
pandas: 1.2.4
numpy: 1.20.2
scipy: 1.6.3
netCDF4: 1.5.6
pydap: installed
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: 2.8.1
cftime: 1.4.1
nc_time_axis: 1.2.0
PseudoNetCDF: installed
rasterio: 1.2.2
cfgrib: 0.9.9.0
iris: 2.4.0
bottleneck: 1.3.2
dask: 2021.05.0
distributed: 2021.05.0
matplotlib: 3.4.1
cartopy: 0.19.0
seaborn: 0.11.1
numbagg: installed
pint: 0.17
setuptools: 49.6.0.post20210108
pip: 21.1
conda: None
pytest: 6.2.3
IPython: 7.22.0
sphinx: 3.5.4