Description
Time dtype
encoding defaults to "int64"
for datasets with only zero-hour times when writing to netcdf or zarr.
This results in these datasets having a precision constrained by how the time units are defined (in the example below daily
precision, given units are defined as 'days since ...'
). If we for instance create a zarr dataset using this default encoding option with such datasets, and subsequently append some non-zero times onto it, we loose the hour/minute/sec information from the appended bits.
MCVE Code Sample
In [1]: ds = xr.DataArray(
...: data=[0.5],
...: coords={"time": [datetime.datetime(2012,1,1)]},
...: dims=("time",),
...: name="x",
...: ).to_dataset()
In [2]: ds
Out[2]:
<xarray.Dataset>
Dimensions: (time: 1)
Coordinates:
* time (time) datetime64[ns] 2012-01-01
Data variables:
x (time) float64 0.5
In [3]: ds.to_zarr("/tmp/x.zarr")
In [4]: ds1 = xr.open_zarr("/tmp/x.zarr")
In [5]: ds1.time.encoding
Out[5]:
{'chunks': (1,),
'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
'filters': None,
'units': 'days since 2012-01-01 00:00:00',
'calendar': 'proleptic_gregorian',
'dtype': dtype('int64')}
In [6]: dsnew = xr.DataArray(
...: data=[1.5],
...: coords={"time": [datetime.datetime(2012,1,1,3,0,0)]},
...: dims=("time",),
...: name="x",
...: ).to_dataset()
In [7]: dsnew.to_zarr("/tmp/x.zarr", append_dim="time")
In [8]: ds1 = xr.open_zarr("/tmp/x.zarr")
In [9]: ds1.time.values
Out[9]:
array(['2012-01-01T00:00:00.000000000', '2012-01-01T00:00:00.000000000'],
dtype='datetime64[ns]')
Expected Output
In [9]: ds1.time.values
Out[9]:
array(['2012-01-01T00:00:00.000000000', '2012-01-01T03:00:00.000000000'],
dtype='datetime64[ns]')
Problem Description
Perhaps it would be useful defaulting time dtype
to "float64"
. Another option could be using a finer time resolution by default than that automatically defined from xarray based on the dataset times (for instance, if the units are automatically defined as "days since ...", use "seconds since...".
#### Versions
<details><summary>Output of `xr.show_versions()`</summary>
In [10]: xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.5 (default, Nov 20 2019, 09:21:52)
[GCC 9.2.1 20191008]
python-bits: 64
OS: Linux
OS-release: 5.3.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.3
xarray: 0.15.0
pandas: 1.0.1
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.8.0
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.1.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.3
cfgrib: None
iris: None
bottleneck: None
dask: 2.14.0
distributed: 2.12.0
matplotlib: 3.2.0
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 45.3.0
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.13.0
sphinx: None
</details>