Skip to content

float32 instead of float64 when decoding int16 with scale_factor netcdf var using xarray  #2304

Closed
@DevDaoud

Description

@DevDaoud

Code Sample :

Considering a netcdf file file with the following variable:

short agc_40hz(time, meas_ind) ;
        agc_40hz:_FillValue = 32767s ;
        agc_40hz:units = "dB" ;
        agc_40hz:scale_factor = 0.01 ;

Code:

from netCDF4 import Dataset
import xarray as xr

d = Dataset("test.nc")
a = d.variables['agc_40hz'][:].flatten()[69] ## 21.940000000000001 'numpy.float64'
x = xr.open_dataset("test.nc")
b = x['agc_40hz'].values.flatten()[69] ## 21.939998626708984 'numpy.float32'
abs(a - b) # 0.000001373291017

Problem description :

Different behaviour of xarray comparing to netCDF4 Dataset

When reading the dataset with xarray we found that the decoded type was numpy.float32 instead of numpy.float64
This netcdf variable has an int16 dtype
when the variable is read with the netCDF4 library directly, it is automatically converted to numpy.float64.
in our case we loose on precision when using xarray.
We found two solutions for this:

First solution :

This solution aims to prevent auto_maskandscale

d = Dataset("test.nc")
a = d.variables['agc_40hz'][:].flatten()[69] ## 21.940000000000001 'numpy.float64'
x = xr.open_dataset("test.nc", mask_and_scale=False, decode_times=False)
b = x['agc_40hz'].values.flatten()[69] ## 21.940000000000001 'numpy.float64'
abs(a - b) # 0.000000000000000

Modification in xarray/backends/netCDF4_.py line 241

def _disable_auto_decode_variable(var):
    """Disable automatic decoding on a netCDF4.Variable.

    We handle these types of decoding ourselves.
    """
    pass
    # var.set_auto_maskandscale(False)
    # # only added in netCDF4-python v1.2.8
    # with suppress(AttributeError):
    #     var.set_auto_chartostring(False)

Second solution :

This solution uses numpy.float64 whatever integer type provided.

d = Dataset("test.nc")
a = d.variables['agc_40hz'][:].flatten()[69] ## 21.940000000000001 'numpy.float64'
x = xr.open_dataset("test.nc")
b = x['agc_40hz'].values.flatten()[69] ## 21.940000000000001 'numpy.float64'
abs(a - b) # 0.000000000000000

Modification in xarray/core/dtypes.py line 85

def maybe_promote(dtype):
    """Simpler equivalent of pandas.core.common._maybe_promote

    Parameters
    ----------
    dtype : np.dtype

    Returns
    -------
    dtype : Promoted dtype that can hold missing values.
    fill_value : Valid missing value for the promoted dtype.
    """
    # N.B. these casting rules should match pandas
    if np.issubdtype(dtype, np.floating):
        fill_value = np.nan
    elif np.issubdtype(dtype, np.integer):
        #########################
        #OLD CODE BEGIN
        #########################
        # if dtype.itemsize <= 2:
        #     dtype = np.float32
        # else:
        #     dtype = np.float64
        #########################
        #OLD CODE END
        #########################

        #########################
        #NEW CODE BEGIN
        #########################
        dtype = np.float64 # whether it's int16 or int32 we use float64
        #########################
        #NEW CODE END
        #########################
        fill_value = np.nan
    elif np.issubdtype(dtype, np.complexfloating):
        fill_value = np.nan + np.nan * 1j
    elif np.issubdtype(dtype, np.datetime64):
        fill_value = np.datetime64('NaT')
    elif np.issubdtype(dtype, np.timedelta64):
        fill_value = np.timedelta64('NaT')
    else:
        dtype = object
        fill_value = np.nan
    return np.dtype(dtype), fill_value

Solution number 2 would be great for us.
At this point we don't know if this modification would introduce some side effects.
Is there another way to avoid this problem ?

Expected Output

In our case we expect the variable to be in numpy.float64 as it is done by netCDF4.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-23-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.10.8
pandas: 0.23.3
numpy: 1.15.0rc2
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: 0.18.1
distributed: 1.22.0
matplotlib: 2.2.2
cartopy: None
seaborn: None
setuptools: 40.0.0
pip: 10.0.1
conda: None
pytest: 3.6.3
IPython: 6.4.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions