Skip to content

to_zarr with append or region mode and _FillValue doesnt work #6329

Open
@d70-t

Description

@d70-t

What happened?

import numpy as np
import xarray as xr
ds = xr.Dataset({"a": ("x", [3.], {"_FillValue": np.nan})})
m = {}
ds.to_zarr(m)
ds.to_zarr(m, append_dim="x")

raises

ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

What did you expect to happen?

I'd expect this to just work (effectively concatenating the dataset to itself).

Anything else we need to know?

appears also for region writes

The same issue appears for region writes as in:

import numpy as np
import dask.array as da
import xarray as xr
ds = xr.Dataset({"a": ("x", da.array([3.,4.]), {"_FillValue": np.nan})})
m = {}
ds.to_zarr(m, compute=False, encoding={"a": {"chunks": (1,)}})
ds.isel(x=slice(0,1)).to_zarr(m, region={"x": slice(0,1)})

raises

ValueError: failed to prevent overwriting existing key _FillValue in attrs. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

there's a workaround

The workaround (deleting the _FillValue in subsequent writes):

m = {}
ds.to_zarr(m)
del ds.a.attrs["_FillValue"]
ds.to_zarr(m, append_dim="x")

seems to do the trick.

There are indications that the result might still be broken, but it's not yet clear how to reproduce them (see comments below).

This issue has been split off from #6069

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.10 (main, Jan 15 2022, 11:48:00)
[Clang 13.0.0 (clang-1300.0.29.3)]
python-bits: 64
OS: Darwin
OS-release: 20.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.20.1
pandas: 1.2.0
numpy: 1.21.2
scipy: 1.6.2
netCDF4: 1.5.8
pydap: installed
h5netcdf: 0.11.0
h5py: 3.2.1
Nio: None
zarr: 2.11.0
cftime: 1.3.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2021.11.1
distributed: 2021.11.1
matplotlib: 3.4.1
cartopy: 0.20.1
seaborn: 0.11.1
numbagg: None
fsspec: 2021.11.1
cupy: None
pint: 0.17
sparse: 0.13.0
setuptools: 60.5.0
pip: 21.3.1
conda: None
pytest: 6.2.2
IPython: 8.0.0.dev
sphinx: 3.5.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions