Skip to content

open_mfdataset overwrites variables with different values but overlapping coordinates #4077

Open
@malmans2

Description

@malmans2

In the example below I'm opening and concatenating two datasets using open_mfdataset. These datasets have variables with different values but overlapping coordinates. I'm concatenating along y, which is 0...4 in one dataset and 0...5 in the other. The y dimension of the resulting dataset is 0...5 which means that open_mfdataset has overwritten some values without showing any error/warning.

Is this the expected default behavior? I would expect to get at least a warning, but maybe I'm misunderstanding the default arguments.

I tried to play with the arguments, but I couldn't figure out which argument I should change to get an error in these scenarios.

MCVE Code Sample

import xarray as xr
import numpy as np
for i in range(2):
    ds = xr.Dataset(
        {"foo": (("x", "y"), np.random.rand(4, 5 + i))},
        coords={"x": np.arange(4), "y": np.arange(5 + i)},
    )
    print(ds)
    ds.to_netcdf(f"tmp{i}.nc")
<xarray.Dataset>
Dimensions:  (x: 4, y: 5)
Coordinates:
  * x        (x) int64 0 1 2 3
  * y        (y) int64 0 1 2 3 4
Data variables:
    foo      (x, y) float64 0.1271 0.6117 0.3769 0.1884 ... 0.853 0.5026 0.3762
<xarray.Dataset>
Dimensions:  (x: 4, y: 6)
Coordinates:
  * x        (x) int64 0 1 2 3
  * y        (y) int64 0 1 2 3 4 5
Data variables:
    foo      (x, y) float64 0.2841 0.6098 0.7761 0.0673 ... 0.2954 0.7212 0.3954
DS = xr.open_mfdataset("tmp*.nc", concat_dim="y", combine="by_coords")
print(DS)
<xarray.Dataset>
Dimensions:  (x: 4, y: 6)
Coordinates:
  * x        (x) int64 0 1 2 3
  * y        (y) int64 0 1 2 3 4 5
Data variables:
    foo      (x, y) float64 dask.array<chunksize=(4, 6), meta=np.ndarray>

Versions

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-29-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.15.1
pandas: 1.0.3
numpy: 1.18.4
scipy: None
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.16.0
distributed: 2.16.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 46.4.0.post20200518
pip: 20.1
conda: None
pytest: None
IPython: 7.13.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions