Skip to content

Error when writing string coordinate variables to zarr #3476

Open
@jsadler2

Description

@jsadler2

I saved an xarray dataset to zarr using to_zarr. I then later tried to read that dataset from the original zarr, re-chunk it, and then write to a new zarr. When I did that I get a strange error. I attached a zip of minimal version of the zarr dataset that I am using.
test_sm_zarr.zip

MCVE Code Sample

import xarray as xr

sm_from_zarr = xr.open_zarr('test_sm_zarr')
sm_from_zarr.to_zarr('test_sm_zarr_from', mode='w')

Expected Output

No error

Problem Description

I get this error:

C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\core\merge.py:18: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
  PANDAS_TYPES = (pd.Series, pd.DataFrame, pd.Panel)
C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\core\dataarray.py:1829: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
  'DataArray', pd.Series, pd.DataFrame, pd.Panel]:
Traceback (most recent call last):
  File "rechunk_test.py", line 38, in <module>
    sm_from_zarr.to_zarr('test_sm_zarr_from', mode='w')
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\core\dataset.py", line 1414, in to_zarr
    consolidated=consolidated, append_dim=append_dim)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\api.py", line 1101, in to_zarr
    dump_to_store(dataset, zstore, writer, encoding=encoding)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\api.py", line 929, in dump_to_store
    unlimited_dims=unlimited_dims)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\zarr.py", line 366, in store
    unlimited_dims=unlimited_dims)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\zarr.py", line 432, in set_variables
    writer.add(v.data, zarr_array)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\common.py", line 173, in add
    target[...] = source
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1115, in __setitem__
    self.set_basic_selection(selection, value, fields=fields)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1210, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1501, in _set_basic_selection_nd
    self._set_selection(indexer, value, fields=fields)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1550, in _set_selection
    self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1659, in _chunk_setitem
    fields=fields)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1723, in _chunk_setitem_nosync
    cdata = self._encode_chunk(chunk)
  File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1769, in _encode_chunk
    chunk = f.encode(chunk)
  File "numcodecs/vlen.pyx", line 108, in numcodecs.vlen.VLenUTF8.encode
TypeError: expected unicode string, found 20

BUT
I think it has something to do with the datatype of one of my coordinates, site_code. Because, if it do this I get no error:

import xarray as xr

sm_from_zarr = xr.open_zarr('test_sm_zarr')
sm_from_zarr['site_code'] = sm_from_zarr.site_code.astype('str')
sm_from_zarr.to_zarr('test_sm_zarr_from', mode='w')

Before converting the datatype of the site_code coordinate is object

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.12.2
pandas: 0.25.1
numpy: 1.17.1
scipy: 1.3.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: None
h5py: None
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.3.0
distributed: 2.5.1
matplotlib: 3.1.1
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: 5.1.2
IPython: 7.8.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions