Skip to content

[bug ] xarry np.timedelta64 overflow #2272

Closed
@lukasbrunner

Description

@lukasbrunner
import numpy as np
import xarray as xr

ds = xr.Dataset(coords={'time': (
    'time', 
    np.arange(106300.5, 106665.5+5*365, 365), 
    {'units': 'days since 1800-01-01 00:00:00'})})
ds = xr.decode_cf(ds)
ds.to_netcdf('./test.nc')
ds = xr.open_dataset('./test.nc', decode_cf=False)
print(ds.time)

<xarray.DataArray 'time' (time: 6)>
array([ 106300.5     ,  106665.5     , -106473.482335, -106108.482335,
       -105743.482335, -105378.482335])
Coordinates:
  * time     (time) float64 1.063e+05 1.067e+05 -1.065e+05 -1.061e+05 ...
Attributes:
    _FillValue:  nan
    units:       days since 1800-01-01
    calendar:    proleptic_gregorian

Problem description

The saved netCDF file contains negative time values since "292 years is the maximum length of time a np.timedelta64 object with nanosecond precision can represent" (see here in the documentation) and it therefore runs into an overflow in the example (via skcs explanation to my question on StackOverflow).

Expected Output

Correct time values.

Possible solution:

For dates outside of the np.datetime range the behaviour is already correct since xarray falls back to cftime.datetime. I don't know if this could be a feasible solution but I wanted to mention it here.

import numpy as np
import xarray as xr

ds = xr.Dataset(coords={'time': (
    'time', 
    np.arange(106300.5, 106665.5+5*365, 365), 
    {'units': 'days since 0001-01-01 00:00:00'})})

ds = xr.decode_cf(ds)

/opt/anaconda2/envs/py36/lib/python3.6/site-packages/xarray/coding/times.py:132: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy cftime.datetime objects instead, reason: dates out of range
  enable_cftimeindex)
/opt/anaconda2/envs/py36/lib/python3.6/site-packages/xarray/coding/variables.py:66: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy cftime.datetime objects instead, reason: dates out of range
  return self.func(self.array[key])

ds.to_netcdf('./test.nc')
ds = xr.open_dataset('./test.nc', decode_cf=False)
print(ds.time)

<xarray.DataArray 'time' (time: 6)>
array([106300.5, 106665.5, 107030.5, 107395.5, 107760.5, 108125.5])
Coordinates:
  * time     (time) float64 1.063e+05 1.067e+05 1.07e+05 1.074e+05 1.078e+05 ...
Attributes:
    _FillValue:  nan
    units:       days since 0001-01-01
    calendar:    gregorian

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

xarray: 0.10.7
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.1
netCDF4: 1.3.1
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.1
distributed: 1.22.0
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 39.0.1
pip: 9.0.1
conda: None
pytest: None
IPython: 6.2.1
sphinx: 1.7.2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions