Description
What happened:
xr.combine_by_coords
failed (only when the arrays are named)
What you expected to happen:
xr.combine_by_coords
to work as intended.
Minimal Complete Verifiable Example:
import xarray as xr
import dask.array as da
import numpy as np
coords = [("x", np.arange(200)),("y", np.arange(1000)),("z", np.arange(1000))]
DataArray_list = []
n= 1
for i in range(n):
test_data = da.random.random((1,200,1000,1000))
coords_i = [("time",[i])] + coords
data_i = xr.DataArray(test_data,coords = coords_i)
#data_i.name = None
DataArray_list.append(data_i)
print(*DataArray_list,sep = '\n\n')
Combined = xr.Dataset()
Combined["test"] = xr.combine_by_coords(DataArray_list)
When n == 1
:
runcell(0, '/home/alavandier/bug_combine_by_coords.py')
<xarray.DataArray 'random_sample-4545ef044176a8b440a43599b310e9c1' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
* time (time) int64 0
* x (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
* y (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
* z (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/array/core.py:383: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
o = func(*args, **kwargs)
Traceback (most recent call last):
File "/home/alavandier/bug_combine_by_coords.py", line 21, in <module>
Combined["test"] = xr.combine_by_coords(DataArray_list)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/dataset.py", line 1563, in __setitem__
self.update({key: value})
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/dataset.py", line 4208, in update
merge_result = dataset_update_method(self, other)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/merge.py", line 984, in dataset_update_method
return merge_core(
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/merge.py", line 632, in merge_core
collected = collect_variables_and_indexes(aligned)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/merge.py", line 294, in collect_variables_and_indexes
variable = as_variable(variable, name=name)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/variable.py", line 141, in as_variable
data = as_compatible_data(obj)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/variable.py", line 238, in as_compatible_data
data = np.asarray(data)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/numpy/core/_asarray.py", line 102, in asarray
return array(a, dtype, copy=False, order=order)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/dataset.py", line 1461, in __array__
raise TypeError(
TypeError: cannot directly convert an xarray.Dataset into a numpy array. Instead, create an xarray.DataArray first, either with indexing on the Dataset or by invoking the `to_array()` method.
When n>=2
:
runcell(0, '/home/alavandier/bug_combine_by_coords.py')
<xarray.DataArray 'random_sample-8a3680be28e920d13cc66464a1ef1669' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
* time (time) int64 0
* x (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
* y (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
* z (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
<xarray.DataArray 'random_sample-991bff72d4c572ef8bd3a9f08308cc19' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
* time (time) int64 1
* x (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
* y (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
* z (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
Traceback (most recent call last):
File "/home/alavandier/bug_combine_by_coords.py", line 21, in <module>
Combined["test"] = xr.combine_by_coords(DataArray_list)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/combine.py", line 891, in combine_by_coords
sorted_datasets = sorted(data_objects, key=vars_as_keys)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/common.py", line 129, in __bool__
return bool(self.values)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Anything else we need to know?:
- Uncommenting the line
data_i.name = None
fixes everything. - By manually interrupting
xr.combine_by_coords
whenn == 2
before it fails on its own, we can see that it actually computes the dask arrays which is also a problem. Here's an example to show that.
runcell(0, '/home/alavandier/bug_combine_by_coords.py')
<xarray.DataArray 'random_sample-85eafb2cca5305a2d75153f0df7aca91' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
* time (time) int64 0
* x (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
* y (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
* z (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
<xarray.DataArray 'random_sample-e4ed3ea4a1d6918599ccba99f02e2d9e' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
* time (time) int64 1
* x (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
* y (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
* z (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
Traceback (most recent call last):
File "/home/alavandier/bug_combine_by_coords.py", line 21, in <module>
Combined["test"] = xr.combine_by_coords(DataArray_list)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/combine.py", line 891, in combine_by_coords
sorted_datasets = sorted(data_objects, key=vars_as_keys)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/common.py", line 129, in __bool__
return bool(self.values)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/dataarray.py", line 651, in values
return self.variable.values
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/variable.py", line 517, in values
return _as_array_or_item(self._data)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/variable.py", line 259, in _as_array_or_item
data = np.asarray(data)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/numpy/core/_asarray.py", line 102, in asarray
return array(a, dtype, copy=False, order=order)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/array/core.py", line 1476, in __array__
x = self.compute()
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/base.py", line 285, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/base.py", line 567, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/threaded.py", line 79, in get
results = get_async(
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/local.py", line 503, in get_async
for key, res_info, failed in queue_get(queue).result():
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/local.py", line 134, in queue_get
return q.get()
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/queue.py", line 171, in get
self.not_empty.wait()
File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/threading.py", line 312, in wait
waiter.acquire()
KeyboardInterrupt
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-88-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.6.1
xarray: 0.19.0
pandas: 1.3.2
numpy: 1.20.3
scipy: 1.6.2
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.04.1
distributed: 2021.04.1
matplotlib: 3.3.4
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 58.0.4
pip: 21.2.4
conda: None
pytest: None
IPython: 7.27.0
sphinx: 4.2.0