Skip to content

Combine_by_coords not working on named DataArrays where the data is a Dask Array.  #5833

Closed
@anlavandier

Description

@anlavandier

What happened:
xr.combine_by_coords failed (only when the arrays are named)
What you expected to happen:
xr.combine_by_coords to work as intended.
Minimal Complete Verifiable Example:

import xarray as xr
import dask.array as da
import numpy as np


coords = [("x", np.arange(200)),("y", np.arange(1000)),("z", np.arange(1000))]

DataArray_list = []

n= 1 

for i in range(n):
    test_data = da.random.random((1,200,1000,1000))
    coords_i = [("time",[i])] + coords 
    data_i = xr.DataArray(test_data,coords = coords_i)
    #data_i.name = None
    
    DataArray_list.append(data_i)

print(*DataArray_list,sep = '\n\n')

Combined = xr.Dataset()
Combined["test"] = xr.combine_by_coords(DataArray_list)

When n == 1:

runcell(0, '/home/alavandier/bug_combine_by_coords.py')
<xarray.DataArray 'random_sample-4545ef044176a8b440a43599b310e9c1' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) int64 0
  * x        (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
  * y        (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
  * z        (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/array/core.py:383: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  o = func(*args, **kwargs)
Traceback (most recent call last):

  File "/home/alavandier/bug_combine_by_coords.py", line 21, in <module>
    Combined["test"] = xr.combine_by_coords(DataArray_list)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/dataset.py", line 1563, in __setitem__
    self.update({key: value})

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/dataset.py", line 4208, in update
    merge_result = dataset_update_method(self, other)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/merge.py", line 984, in dataset_update_method
    return merge_core(

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/merge.py", line 632, in merge_core
    collected = collect_variables_and_indexes(aligned)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/merge.py", line 294, in collect_variables_and_indexes
    variable = as_variable(variable, name=name)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/variable.py", line 141, in as_variable
    data = as_compatible_data(obj)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/variable.py", line 238, in as_compatible_data
    data = np.asarray(data)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/numpy/core/_asarray.py", line 102, in asarray
    return array(a, dtype, copy=False, order=order)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/dataset.py", line 1461, in __array__
    raise TypeError(

TypeError: cannot directly convert an xarray.Dataset into a numpy array. Instead, create an xarray.DataArray first, either with indexing on the Dataset or by invoking the `to_array()` method.

When n>=2:

runcell(0, '/home/alavandier/bug_combine_by_coords.py')
<xarray.DataArray 'random_sample-8a3680be28e920d13cc66464a1ef1669' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) int64 0
  * x        (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
  * y        (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
  * z        (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999

<xarray.DataArray 'random_sample-991bff72d4c572ef8bd3a9f08308cc19' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) int64 1
  * x        (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
  * y        (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
  * z        (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
Traceback (most recent call last):

  File "/home/alavandier/bug_combine_by_coords.py", line 21, in <module>
    Combined["test"] = xr.combine_by_coords(DataArray_list)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/combine.py", line 891, in combine_by_coords
    sorted_datasets = sorted(data_objects, key=vars_as_keys)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/common.py", line 129, in __bool__
    return bool(self.values)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Anything else we need to know?:

  • Uncommenting the line data_i.name = None fixes everything.
  • By manually interrupting xr.combine_by_coords when n == 2 before it fails on its own, we can see that it actually computes the dask arrays which is also a problem. Here's an example to show that.
runcell(0, '/home/alavandier/bug_combine_by_coords.py')
<xarray.DataArray 'random_sample-85eafb2cca5305a2d75153f0df7aca91' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) int64 0
  * x        (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
  * y        (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
  * z        (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999

<xarray.DataArray 'random_sample-e4ed3ea4a1d6918599ccba99f02e2d9e' (time: 1, x: 200, y: 1000, z: 1000)>
dask.array<random_sample, shape=(1, 200, 1000, 1000), dtype=float64, chunksize=(1, 200, 250, 250), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) int64 1
  * x        (x) int64 0 1 2 3 4 5 6 7 8 ... 191 192 193 194 195 196 197 198 199
  * y        (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
  * z        (z) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
Traceback (most recent call last):

  File "/home/alavandier/bug_combine_by_coords.py", line 21, in <module>
    Combined["test"] = xr.combine_by_coords(DataArray_list)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/combine.py", line 891, in combine_by_coords
    sorted_datasets = sorted(data_objects, key=vars_as_keys)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/common.py", line 129, in __bool__
    return bool(self.values)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/dataarray.py", line 651, in values
    return self.variable.values

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/variable.py", line 517, in values
    return _as_array_or_item(self._data)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/xarray/core/variable.py", line 259, in _as_array_or_item
    data = np.asarray(data)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/numpy/core/_asarray.py", line 102, in asarray
    return array(a, dtype, copy=False, order=order)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/array/core.py", line 1476, in __array__
    x = self.compute()

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/base.py", line 285, in compute
    (result,) = compute(self, traverse=False, **kwargs)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/base.py", line 567, in compute
    results = schedule(dsk, keys, **kwargs)

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/threaded.py", line 79, in get
    results = get_async(

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/local.py", line 503, in get_async
    for key, res_info, failed in queue_get(queue).result():

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/site-packages/dask/local.py", line 134, in queue_get
    return q.get()

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/queue.py", line 171, in get
    self.not_empty.wait()

  File "/home/alavandier/anaconda3/envs/dask_env/lib/python3.9/threading.py", line 312, in wait
    waiter.acquire()

KeyboardInterrupt

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-88-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.6.1

xarray: 0.19.0
pandas: 1.3.2
numpy: 1.20.3
scipy: 1.6.2
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.04.1
distributed: 2021.04.1
matplotlib: 3.3.4
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 58.0.4
pip: 21.2.4
conda: None
pytest: None
IPython: 7.27.0
sphinx: 4.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions