Skip to content

assign_coords' behavior depends on input DataArrays #8180

Open
@yichechang

Description

@yichechang

What happened?

I'm trying to compute masks (from DataArray's data itself) and assign them as coordinates, but it appears that depending on the combination of coords/dims of the computed masks, sometimes .assign_coords will fail.

It seems like

  • it fails when all the mask DataArray's (each is a mask computed, but it probably doesn't matter) to be assigned as coordinates, share a dimension common to the target DataArray, and the dimension contains only a singular value (across all mask DataArray's)
  • it doesn't fail when the shared dimension contains more than one value.

It's a bit hard to describe as I don't know the xarray internal itself, but my self-contained minimal example below should demonstrate the issue much clearer.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import xarray as xr

data = xr.DataArray(
    data=[
        [0, 1, 2], 
        [0, 1, 2]
    ], 
    coords={
        'd1': ['m', 'n'], 
        'd2': ['a', 'b', 'c']
    }
)


# this will fail:
data.assign_coords({'mask_d1_m': data.sel(d1='m')==0})
# ValueError: dimension 'd1' already exists as a scalar variable

# this will fail too:
data.assign_coords({'mask_d1_n': data.sel(d1='n')==0})
# ValueError: dimension 'd1' already exists as a scalar variable

# but this will work:
data.assign_coords(
    {
        'mask_d1_m': data.sel(d1='m')==0,
        'mask_d1_n': data.sel(d1='n')==0
    }
)
# <xarray.DataArray (d1: 2, d2: 3)>
# array([[0, 1, 2],
#        [0, 1, 2]])
# Coordinates:
#   * d1         (d1) <U1 'm' 'n'
#   * d2         (d2) <U1 'a' 'b' 'c'
#     mask_d1_m  (d2) bool True False False
#     mask_d1_n  (d2) bool True False False

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[27], line 1
----> 1 data.assign_coords({'mask_d1_n': data.sel(d1='n')==0})

File ~/mambaforge/envs/quickquant/lib/python3.8/site-packages/xarray/core/common.py:615, in DataWithCoords.assign_coords(self, coords, **coords_kwargs)
    613 data = self.copy(deep=False)
    614 results: dict[Hashable, Any] = self._calc_assign_results(coords_combined)
--> 615 data.coords.update(results)
    616 return data

File ~/mambaforge/envs/quickquant/lib/python3.8/site-packages/xarray/core/coordinates.py:177, in Coordinates.update(self, other)
    173 self._maybe_drop_multiindex_coords(set(other_vars))
    174 coords, indexes = merge_coords(
    175     [self.variables, other_vars], priority_arg=1, indexes=self.xindexes
    176 )
--> 177 self._update_coords(coords, indexes)

File ~/mambaforge/envs/quickquant/lib/python3.8/site-packages/xarray/core/coordinates.py:393, in DataArrayCoordinates._update_coords(self, coords, indexes)
    391 coords_plus_data = coords.copy()
    392 coords_plus_data[_THIS_ARRAY] = self._data.variable
--> 393 dims = calculate_dimensions(coords_plus_data)
    394 if not set(dims) <= set(self.dims):
    395     raise ValueError(
    396         "cannot add coordinates with new dimensions to a DataArray"
    397     )

File ~/mambaforge/envs/quickquant/lib/python3.8/site-packages/xarray/core/variable.py:3209, in calculate_dimensions(variables)
   3207 for dim, size in zip(var.dims, var.shape):
   3208     if dim in scalar_vars:
-> 3209         raise ValueError(
   3210             f"dimension {dim!r} already exists as a scalar variable"
   3211         )
   3212     if dim not in dims:
   3213         dims[dim] = size

ValueError: dimension 'd1' already exists as a scalar variable

Anything else we need to know?

No response

Environment

~/mambaforge/envs/quickquant/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:11:32)
[Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2023.1.0
pandas: 1.5.3
numpy: 1.24.0
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.15.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2023.5.0
distributed: 2023.5.0
matplotlib: 3.7.2
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.9.0
cupy: None
pint: 0.21
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.1.2
pip: 23.2.1
conda: 23.7.3
pytest: 7.4.1
mypy: None
IPython: 8.12.2
sphinx: 4.5.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions