Skip to content

Coordinate dtype changing to object after xr.concat #4543

Closed
@JS-Parent

Description

@JS-Parent

What happened: The dtype of DataArray coordinates change after concatenation using xr.concat

What you expected to happen: dtype of DataArray coordinates to stay the same.

Minimal Complete Verifiable Example:

In the below I create two examples. The first one shows the issue happening on the coords associated to the concatenated dimension. In the second I use different dtypes and the problem appears on both dimensions.

Example 1:

import numpy as np
import xarray as xr

da1 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
                   dims=["x1", "x2"],
                   coords={"x1": np.array([0, 1]),
                           "x2": np.array(['a', 'b'])})
da2 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
                   dims=["x1", "x2"],
                   coords={"x1": np.array([1, 2]),
                           "x2": np.array(['c', 'd'])})
da_joined = xr.concat([da1, da2], dim="x2")

print("coord x1 dtype:")
print("in da1:", da1.coords["x1"].data.dtype)
print("in da2:", da2.coords["x1"].data.dtype)
print("after concat:", da_joined.coords["x1"].data.dtype)
# this in line with expectations:
# coord x1 dtype:
# in da1: int64
# in da2: int64
# after concat: int64

print("coord x2 dtype")
print("in da1:", da1.coords["x2"].data.dtype)
print("in da2:", da2.coords["x2"].data.dtype)
print("after concat:", da_joined.coords["x2"].data.dtype)
# coord x2 dtype
# in da1: <U1
# in da2: <U1
# after concat: object           # This is the problem: it should still be <U1

Example 2:

da1 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
                   dims=["x1", "x2"],
                   coords={"x1": np.array([b'\x00', b'\x01']),
                           "x2": np.array(['a', 'b'])})

da2 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
                   dims=["x1", "x2"],
                   coords={"x1": np.array([b'\x01', b'\x02']),
                           "x2": np.array(['c', 'd'])})

da_joined = xr.concat([da1, da2], dim="x2")

# coord x1 dtype:
# in da1: |S1
# in da2: |S1
# after concat: object              # This is the problem: it should still be |S1
# coord x2 dtype
# in da1: <U1
# in da2: <U1
# after concat: object              # This is the problem: it should still be <U1

Anything else we need to know:

This seems related to #1266

Environment: Ubuntu 18.04, python 3.7.9, xarray 0.16.1

Output of xr.show_versions()

xr.show_versions()
INSTALLED VERSIONS

commit: None
python: 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-51-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.16.1
pandas: 0.25.3
numpy: 1.19.1
scipy: 1.5.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 50.3.0
pip: 20.2.4
conda: None
pytest: None
IPython: 7.18.1
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions