Skip to content

to_zarr: region not recognised as dataset dimensions #6069

Open
@Boorhin

Description

@Boorhin

What happened:
I am trying to write into a zarr a dataset by reading each timesteps and write this time step into a zarr store. For that I prepared the dataset and filled the variable values with 0s and wrote the store with mode='a'. I then perform the operations I need on each variables and try to write the time step of the dataset.

ds.isel(time=t).to_zarr(outfile, region={"time": t})

However I received this error message:
ValueError: all keys in ``region`` are not in Dataset dimensions, got ['time'] and ['cell', 'face', 'layer', 'max_cell_node', 'max_face_nodes', 'node', 'siglay']
But

In: ds.dims
Out: Frozen(SortedKeysDict({'time': 1465, 'node': 112015, 'layer': 6, 'face': 198364, 'max_face_nodes': 3, 'cell': 991820, 'max_cell_node': 6, 'siglay': 6}))

Checking in the API, it comes from

~/.local/lib/python3.8/site-packages/xarray/backends/api.py in _validate_append_dim_and_encoding(ds_to_append, store, append_dim, region, encoding, **open_kwargs)
   1330         for k, v in region.items():
   1331             if k not in ds_to_append.dims:
-> 1332                 raise ValueError(
   1333                     f"all keys in ``region`` are not in Dataset dimensions, got "
   1334                     f"{list(region)} and {list(ds_to_append.dims)}"

What you expected to happen:
Incremently append data to the zarr store

Minimal Complete Verifiable Example:

import xarray as xr
from datetime import datetime,timedelta
import numpy as np
dt= datetime.now()
times= np.arange(dt,dt+timedelta(days=6), timedelta(hours=1))
nodesx,nodesy,layers=np.arange(10,50), np.arange(10,50)+15, np.arange(10)
ds=xr.Dataset()
ds.coords['time']=('time', times)
ds.coords['node_x']=('node', nodesx)
ds.coords['node_y']=('node', nodesy)
ds.coords['layer']=('layer', layers)
outfile='my_zarr'
varnames=['potato','banana', 'apple']
for var in varnames:
    ds[var]=(('time', 'layer', 'node'), np.zeros((len(times), len(layers),len(nodesx))))
ds.to_zarr(outfile, mode='a')
for t in range(len(times)):
    for var in varnames:
    	ds[var].isel(time=t).values += np.random.random((len(layers),len(nodesx)))
    ds.isel(time=t).to_zarr(outfile, region={"time": t})

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-8baf03aa01a3> in <module>
      2     for var in varnames:
      3         ds[var].isel(time=t).values += np.random.random((len(layers),len(nodesx)))
----> 4     ds.isel(time=t).to_zarr(outfile, region={"time": t})
      5 

~/.local/lib/python3.8/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region)
   1743             encoding = {}
   1744 
-> 1745         return to_zarr(
   1746             self,
   1747             store=store,

~/.local/lib/python3.8/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region)
   1457     if mode == "a":
   1458         _validate_datatypes_for_zarr_append(dataset)
-> 1459         _validate_append_dim_and_encoding(
   1460             dataset,
   1461             store,

~/.local/lib/python3.8/site-packages/xarray/backends/api.py in _validate_append_dim_and_encoding(ds_to_append, store, append_dim, region, encoding, **open_kwargs)
   1330         for k, v in region.items():
   1331             if k not in ds_to_append.dims:
-> 1332                 raise ValueError(
   1333                     f"all keys in ``region`` are not in Dataset dimensions, got "
   1334                     f"{list(region)} and {list(ds_to_append.dims)}"

ValueError: all keys in ``region`` are not in Dataset dimensions, got ['time'] and ['layer', 'node']

Anything else we need to know?:

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-91-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3

xarray: 0.16.2
pandas: 1.2.2
numpy: 1.17.4
scipy: 1.6.2
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.10.2
cftime: 1.1.0
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.11.2
distributed: 2021.11.2
matplotlib: 3.1.2
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 45.2.0
pip: 20.0.2
conda: None
pytest: 6.2.1
IPython: 7.13.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions