Description
What happened: The dtype of DataArray coordinates change after concatenation using xr.concat
What you expected to happen: dtype of DataArray coordinates to stay the same.
Minimal Complete Verifiable Example:
In the below I create two examples. The first one shows the issue happening on the coords associated to the concatenated dimension. In the second I use different dtypes and the problem appears on both dimensions.
Example 1:
import numpy as np
import xarray as xr
da1 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
dims=["x1", "x2"],
coords={"x1": np.array([0, 1]),
"x2": np.array(['a', 'b'])})
da2 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
dims=["x1", "x2"],
coords={"x1": np.array([1, 2]),
"x2": np.array(['c', 'd'])})
da_joined = xr.concat([da1, da2], dim="x2")
print("coord x1 dtype:")
print("in da1:", da1.coords["x1"].data.dtype)
print("in da2:", da2.coords["x1"].data.dtype)
print("after concat:", da_joined.coords["x1"].data.dtype)
# this in line with expectations:
# coord x1 dtype:
# in da1: int64
# in da2: int64
# after concat: int64
print("coord x2 dtype")
print("in da1:", da1.coords["x2"].data.dtype)
print("in da2:", da2.coords["x2"].data.dtype)
print("after concat:", da_joined.coords["x2"].data.dtype)
# coord x2 dtype
# in da1: <U1
# in da2: <U1
# after concat: object # This is the problem: it should still be <U1
Example 2:
da1 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
dims=["x1", "x2"],
coords={"x1": np.array([b'\x00', b'\x01']),
"x2": np.array(['a', 'b'])})
da2 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
dims=["x1", "x2"],
coords={"x1": np.array([b'\x01', b'\x02']),
"x2": np.array(['c', 'd'])})
da_joined = xr.concat([da1, da2], dim="x2")
# coord x1 dtype:
# in da1: |S1
# in da2: |S1
# after concat: object # This is the problem: it should still be |S1
# coord x2 dtype
# in da1: <U1
# in da2: <U1
# after concat: object # This is the problem: it should still be <U1
Anything else we need to know:
This seems related to #1266
Environment: Ubuntu 18.04, python 3.7.9, xarray 0.16.1
Output of xr.show_versions()
xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-51-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.16.1
pandas: 0.25.3
numpy: 1.19.1
scipy: 1.5.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 50.3.0
pip: 20.2.4
conda: None
pytest: None
IPython: 7.18.1
sphinx: None