Skip to content

dividing with unloaded data causes dimension to change order #10338

Open
@Pietervanhalem

Description

@Pietervanhalem

What happened?

When I open a dataset without loading it and perform opperations with it. The data-array gets corrupted. The dimensions seem to be in a different order then the coordinates. Therefore you cannot use the data-array anymore. If I load the dataset after opening it I dont have the issue anymore.

What did you expect to happen?

I expect the data-array to keep the correct references to the correct coordinates when doing operations with it. I expect the same to happen as when I do load the data.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

coords = {
    "location": ["a", "b", "c"],
    "duration": [0.3, 0.25, 0.5, 1.0, 3.0],
    "dof": ["x", "y", "z", "rx", "ry", "rz"],
    "motion": ["dis", "vel"],
    "wave_tp": np.arange(3, 19, 1),
    "wave_dir": np.arange(0, 361, 15),
}
ds = xr.Dataset(
    {
        "X": (list(coords.keys()), np.random.rand(*[len(e) for e in coords.values()])),
    },
    coords=coords,
)

with open("tmp.nc", "wb") as fp:
    ds.to_netcdf(fp)

with open("tmp.nc", "rb") as fp:
    # If I perform a .load() here, the bug disappears
    ds = xr.open_dataset(fp) #.load()

a = ds["X"].sel(
    wave_dir=np.arange(0, 360, 30),
    dof="z",
    motion="vel",
)
b = 1 / a

# Here you can see that the dataset has the wrong coordinates. 
# It says location has 12 values, but it should have 3.
display(b)

b.sel(location='a')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

{
	"name": "ValueError",
	"message": "conflicting sizes for dimension 'location': length 12 on <this-array> and length 3 on {'wave_dir': 'wave_dir', 'wave_tp': 'wave_tp', 'duration': 'duration', 'location': 'location'}",
	"stack": "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)\nCell \u001b[1;32mIn[1], line 37\u001b[0m\n\u001b[0;32m     33\u001b[0m \u001b[38;5;66;03m# Here you can see that the dataset has the wrong coordinates. \u001b[39;00m\n\u001b[0;32m     34\u001b[0m \u001b[38;5;66;03m# It says location has 12 values, but it should have 3.\u001b[39;00m\n\u001b[0;32m     35\u001b[0m display(b)\n\u001b[1;32m---> 37\u001b[0m \u001b[43mb\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msel\u001b[49m\u001b[43m(\u001b[49m\u001b[43mlocation\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43ma\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\dataarray.py:1683\u001b[0m, in \u001b[0;36mDataArray.sel\u001b[1;34m(self, indexers, method, tolerance, drop, **indexers_kwargs)\u001b[0m\n\u001b[0;32m   1567\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21msel\u001b[39m(\n\u001b[0;32m   1568\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[0;32m   1569\u001b[0m     indexers: Mapping[Any, Any] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m   1573\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mindexers_kwargs: Any,\n\u001b[0;32m   1574\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Self:\n\u001b[0;32m   1575\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"Return a new DataArray whose data is given by selecting index\u001b[39;00m\n\u001b[0;32m   1576\u001b[0m \u001b[38;5;124;03m    labels along the specified dimension(s).\u001b[39;00m\n\u001b[0;32m   1577\u001b[0m \n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m   1681\u001b[0m \u001b[38;5;124;03m    Dimensions without coordinates: points\u001b[39;00m\n\u001b[0;32m   1682\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[1;32m-> 1683\u001b[0m     ds \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_to_temp_dataset\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241m.\u001b[39msel(\n\u001b[0;32m   1684\u001b[0m         indexers\u001b[38;5;241m=\u001b[39mindexers,\n\u001b[0;32m   1685\u001b[0m         drop\u001b[38;5;241m=\u001b[39mdrop,\n\u001b[0;32m   1686\u001b[0m         method\u001b[38;5;241m=\u001b[39mmethod,\n\u001b[0;32m   1687\u001b[0m         tolerance\u001b[38;5;241m=\u001b[39mtolerance,\n\u001b[0;32m   1688\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mindexers_kwargs,\n\u001b[0;32m   1689\u001b[0m     )\n\u001b[0;32m   1690\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_from_temp_dataset(ds)\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\dataarray.py:598\u001b[0m, in \u001b[0;36mDataArray._to_temp_dataset\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m    597\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_to_temp_dataset\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Dataset:\n\u001b[1;32m--> 598\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_to_dataset_whole\u001b[49m\u001b[43m(\u001b[49m\u001b[43mname\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m_THIS_ARRAY\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mshallow_copy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\dataarray.py:665\u001b[0m, in \u001b[0;36mDataArray._to_dataset_whole\u001b[1;34m(self, name, shallow_copy)\u001b[0m\n\u001b[0;32m    662\u001b[0m indexes \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_indexes\n\u001b[0;32m    664\u001b[0m coord_names \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mset\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_coords)\n\u001b[1;32m--> 665\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mDataset\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_construct_direct\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvariables\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcoord_names\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mindexes\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mindexes\u001b[49m\u001b[43m)\u001b[49m\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\dataset.py:1133\u001b[0m, in \u001b[0;36mDataset._construct_direct\u001b[1;34m(cls, variables, coord_names, dims, attrs, indexes, encoding, close)\u001b[0m\n\u001b[0;32m   1129\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Shortcut around __init__ for internal use when we want to skip\u001b[39;00m\n\u001b[0;32m   1130\u001b[0m \u001b[38;5;124;03mcostly validation\u001b[39;00m\n\u001b[0;32m   1131\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m   1132\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m dims \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m-> 1133\u001b[0m     dims \u001b[38;5;241m=\u001b[39m \u001b[43mcalculate_dimensions\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvariables\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m   1134\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m indexes \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m   1135\u001b[0m     indexes \u001b[38;5;241m=\u001b[39m {}\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\variable.py:3072\u001b[0m, in \u001b[0;36mcalculate_dimensions\u001b[1;34m(variables)\u001b[0m\n\u001b[0;32m   3070\u001b[0m             last_used[dim] \u001b[38;5;241m=\u001b[39m k\n\u001b[0;32m   3071\u001b[0m         \u001b[38;5;28;01melif\u001b[39;00m dims[dim] \u001b[38;5;241m!=\u001b[39m size:\n\u001b[1;32m-> 3072\u001b[0m             \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[0;32m   3073\u001b[0m                 \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mconflicting sizes for dimension \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mdim\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m: \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m   3074\u001b[0m                 \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlength \u001b[39m\u001b[38;5;132;01m{\u001b[39;00msize\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m on \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mk\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m and length \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mdims[dim]\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m on \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mlast_used\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m   3075\u001b[0m             )\n\u001b[0;32m   3076\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m dims\n\n\u001b[1;31mValueError\u001b[0m: conflicting sizes for dimension 'location': length 12 on <this-array> and length 3 on {'wave_dir': 'wave_dir', 'wave_tp': 'wave_tp', 'duration': 'duration', 'location': 'location'}"
}

Anything else we need to know?

Image

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.7 (tags/v3.12.7:0b05ead, Oct 1 2024, 03:06:41) [MSC v.1941 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 11 machine: AMD64 processor: Intel64 Family 6 Model 186 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United Kingdom', '1252') libhdf5: None libnetcdf: None

xarray: 2025.4.0
pandas: 2.2.3
numpy: 2.2.6
scipy: 1.15.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: 25.1.1
conda: None
pytest: None
mypy: None
IPython: 9.2.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions