Skip to content

Decoding netCDF is giving incorrect values for a large file #5597

Closed
@ohsqueezy

Description

@ohsqueezy

What happened:

0 value is decoded as 2

What you expected to happen:

Data encoded to -32766 should translate to 0

Minimal Complete Verifiable Example:

The first example is the base file I've been using, which is a 9GB packed netCDF. The first 12 values in this lookup should be 0 but are getting decoded as 2.

$ xarray.open_dataset("BIG_FILE_packed.nc").ssrd.isel(time=slice(0, 23)).sel(latitude=44.8, longitude=287.1, method="nearest").values
> array([2.000000e+00, 2.000000e+00, 2.000000e+00, 2.000000e+00,
         2.000000e+00, 2.000000e+00, 2.000000e+00, 2.000000e+00,
         2.000000e+00, 2.000000e+00, 2.000000e+00, 2.000000e+00,
         2.565200e+04, 3.547440e+05, 1.091760e+06, 2.170378e+06,
         3.482364e+06, 4.704884e+06, 5.689655e+06, 6.297786e+06,
         6.534908e+06, 6.543667e+06, 6.543667e+06], dtype=float32)

This second example shows that if the file is decoded without automatic mask_and_scale, the value is decoded as 0 when applying the scale factor and add offset to an example value in the interpreter.

$ xarray.open_dataset("BIG_FILE_packed.nc", mask_and_scale=False).ssrd.isel(time=slice(0, 23)).sel(\
                      latitude=44.8, longitude=287.1, method="nearest").values
> array([-32766, -32766, -32766, -32766, -32766, -32766, -32766, -32766,
         -32766, -32766, -32766, -32766, -32725, -32199, -31021, -29297,
         -27200, -25246, -23672, -22700, -22321, -22307, -22307], dtype=int16)

$ xarray.open_dataset("BIG_FILE_packed.nc", mask_and_scale=False).ssrd.isel(time=slice(0, 23)).sel(\
                      latitude=44.8, longitude=287.1, method="nearest").values[0] * \
  xarray.open_dataset("BIG_FILE_packed.nc").ssrd.encoding["scale_factor"] + xarray.open_dataset("BIG_FILE_packed.nc").ssrd.encoding["add_offset"]
> 0.0

When the netCDF is unpacked using the nco command line tool, the correct values are unpacked.

$ xarray.open_dataset("BIG_FILE_unpacked.nc").ssrd.isel(time=slice(0, 23)).sel(latitude=44.8, longitude=287.1, method="nearest").values
> array([      0.        ,       0.        ,       0.        ,
               0.        ,       0.        ,       0.        ,
               0.        ,       0.        ,       0.        ,
               0.        ,       0.        ,       0.        ,
           25651.61906215,  354743.1221522 , 1091757.933255  ,
         2170377.23235622, 3482363.69999847, 4704882.32554591,
         5689654.23783437, 6297785.304381  , 6534906.36839455,
         6543665.4578304 , 6543665.4578304 ])

Something else that may be relevant is that another file with this same packed data but as much smaller subset (1.7KB) of the big file is unpacked correctly.

$ xarray.open_dataset("SMALL_FILE_packed.nc").ssrd.isel(time=slice(0, 23)).sel(latitude=44.8, longitude=287.1, method="nearest").values
> array([      0.  ,       0.  ,       0.  ,       0.  ,       0.  ,
               0.  ,       0.  ,       0.  ,       0.  ,       0.  ,
               0.  ,       0.  ,   25545.75,  354397.5 , 1091577.  ,
         2170077.  , 3482645.8 , 4704689.  , 5689927.  , 6297856.5 ,
         6535169.  , 6543583.  , 6543583.  ], dtype=float32)

For this to be a real verifiable example, I can transfer the 9GB file to someone or give instructions on how to download it from the climate API I'm getting it from! I'm not sure if this is an issue with xarray or the API or something I'm doing wrong. I've mostly been using an older version of xarray, but I also tested on the most recent version available on PIP:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.6 (default, Jul 8 2021, 20:44:16)
[GCC 5.4.0 20160609]
python-bits: 64
OS: Linux
OS-release: 4.4.0-200-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.18.2
pandas: 1.3.0
numpy: 1.21.0
scipy: None
netCDF4: 1.5.7
pydap: None
h5netcdf: 0.11.0
h5py: 3.3.0
Nio: None
zarr: None
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.9.0
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 51.3.3
pip: 20.3.3
conda: None
pytest: None
IPython: 7.25.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions