Description
What happened?
I'm trying to compute masks (from DataArray's data itself) and assign them as coordinates, but it appears that depending on the combination of coords/dims of the computed masks, sometimes .assign_coords
will fail.
It seems like
- it fails when all the mask DataArray's (each is a mask computed, but it probably doesn't matter) to be assigned as coordinates, share a dimension common to the target DataArray, and the dimension contains only a singular value (across all mask DataArray's)
- it doesn't fail when the shared dimension contains more than one value.
It's a bit hard to describe as I don't know the xarray internal itself, but my self-contained minimal example below should demonstrate the issue much clearer.
What did you expect to happen?
No response
Minimal Complete Verifiable Example
import xarray as xr
data = xr.DataArray(
data=[
[0, 1, 2],
[0, 1, 2]
],
coords={
'd1': ['m', 'n'],
'd2': ['a', 'b', 'c']
}
)
# this will fail:
data.assign_coords({'mask_d1_m': data.sel(d1='m')==0})
# ValueError: dimension 'd1' already exists as a scalar variable
# this will fail too:
data.assign_coords({'mask_d1_n': data.sel(d1='n')==0})
# ValueError: dimension 'd1' already exists as a scalar variable
# but this will work:
data.assign_coords(
{
'mask_d1_m': data.sel(d1='m')==0,
'mask_d1_n': data.sel(d1='n')==0
}
)
# <xarray.DataArray (d1: 2, d2: 3)>
# array([[0, 1, 2],
# [0, 1, 2]])
# Coordinates:
# * d1 (d1) <U1 'm' 'n'
# * d2 (d2) <U1 'a' 'b' 'c'
# mask_d1_m (d2) bool True False False
# mask_d1_n (d2) bool True False False
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[27], line 1
----> 1 data.assign_coords({'mask_d1_n': data.sel(d1='n')==0})
File ~/mambaforge/envs/quickquant/lib/python3.8/site-packages/xarray/core/common.py:615, in DataWithCoords.assign_coords(self, coords, **coords_kwargs)
613 data = self.copy(deep=False)
614 results: dict[Hashable, Any] = self._calc_assign_results(coords_combined)
--> 615 data.coords.update(results)
616 return data
File ~/mambaforge/envs/quickquant/lib/python3.8/site-packages/xarray/core/coordinates.py:177, in Coordinates.update(self, other)
173 self._maybe_drop_multiindex_coords(set(other_vars))
174 coords, indexes = merge_coords(
175 [self.variables, other_vars], priority_arg=1, indexes=self.xindexes
176 )
--> 177 self._update_coords(coords, indexes)
File ~/mambaforge/envs/quickquant/lib/python3.8/site-packages/xarray/core/coordinates.py:393, in DataArrayCoordinates._update_coords(self, coords, indexes)
391 coords_plus_data = coords.copy()
392 coords_plus_data[_THIS_ARRAY] = self._data.variable
--> 393 dims = calculate_dimensions(coords_plus_data)
394 if not set(dims) <= set(self.dims):
395 raise ValueError(
396 "cannot add coordinates with new dimensions to a DataArray"
397 )
File ~/mambaforge/envs/quickquant/lib/python3.8/site-packages/xarray/core/variable.py:3209, in calculate_dimensions(variables)
3207 for dim, size in zip(var.dims, var.shape):
3208 if dim in scalar_vars:
-> 3209 raise ValueError(
3210 f"dimension {dim!r} already exists as a scalar variable"
3211 )
3212 if dim not in dims:
3213 dims[dim] = size
ValueError: dimension 'd1' already exists as a scalar variable
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:11:32)
[Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.1.0
pandas: 1.5.3
numpy: 1.24.0
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.15.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2023.5.0
distributed: 2023.5.0
matplotlib: 3.7.2
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.9.0
cupy: None
pint: 0.21
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.1.2
pip: 23.2.1
conda: 23.7.3
pytest: 7.4.1
mypy: None
IPython: 8.12.2
sphinx: 4.5.0