Skip to content

Time dtype encoding defaulting to int64 when writing netcdf or zarr #3942

Open
@rafa-guedes

Description

@rafa-guedes

Time dtype encoding defaults to "int64" for datasets with only zero-hour times when writing to netcdf or zarr.

This results in these datasets having a precision constrained by how the time units are defined (in the example below daily precision, given units are defined as 'days since ...'). If we for instance create a zarr dataset using this default encoding option with such datasets, and subsequently append some non-zero times onto it, we loose the hour/minute/sec information from the appended bits.

MCVE Code Sample

In [1]: ds = xr.DataArray( 
    ...: data=[0.5], 
    ...: coords={"time": [datetime.datetime(2012,1,1)]}, 
    ...: dims=("time",), 
    ...: name="x", 
    ...: ).to_dataset()

In [2]: ds                                                                                                                                                            
Out[2]: 
<xarray.Dataset>
Dimensions:  (time: 1)
Coordinates:
  * time     (time) datetime64[ns] 2012-01-01
Data variables:
    x        (time) float64 0.5

In [3]: ds.to_zarr("/tmp/x.zarr")

In [4]: ds1 = xr.open_zarr("/tmp/x.zarr")

In [5]: ds1.time.encoding                                                                                                                                             
Out[5]: 
{'chunks': (1,),
 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0),
 'filters': None,
 'units': 'days since 2012-01-01 00:00:00',
 'calendar': 'proleptic_gregorian',
 'dtype': dtype('int64')}

In [6]: dsnew = xr.DataArray( 
    ...: data=[1.5], 
    ...: coords={"time": [datetime.datetime(2012,1,1,3,0,0)]}, 
    ...: dims=("time",), 
    ...: name="x", 
    ...: ).to_dataset()

In [7]: dsnew.to_zarr("/tmp/x.zarr", append_dim="time")                                                                                                               

In [8]: ds1 = xr.open_zarr("/tmp/x.zarr")                                                                                                                             

In [9]: ds1.time.values                                                                                                                                               
Out[9]: 
array(['2012-01-01T00:00:00.000000000', '2012-01-01T00:00:00.000000000'],
      dtype='datetime64[ns]')

Expected Output

In [9]: ds1.time.values                                                                                                                                               
Out[9]: 
array(['2012-01-01T00:00:00.000000000', '2012-01-01T03:00:00.000000000'],
      dtype='datetime64[ns]')

Problem Description

Perhaps it would be useful defaulting time dtype to "float64". Another option could be using a finer time resolution by default than that automatically defined from xarray based on the dataset times (for instance, if the units are automatically defined as "days since ...", use "seconds since...".


#### Versions

<details><summary>Output of `xr.show_versions()`</summary>

In [10]: xr.show_versions()                                                                                                                                            

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.5 (default, Nov 20 2019, 09:21:52) 
[GCC 9.2.1 20191008]
python-bits: 64
OS: Linux
OS-release: 5.3.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_NZ.UTF-8
LOCALE: en_NZ.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.3

xarray: 0.15.0
pandas: 1.0.1
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.8.0
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.1.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.3
cfgrib: None
iris: None
bottleneck: None
dask: 2.14.0
distributed: 2.12.0
matplotlib: 3.2.0
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 45.3.0
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.13.0
sphinx: None

</details>

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions