Skip to content

Suggestion: Add option for default_fillvals to open_dataset #2374

Open
@MeraX

Description

@MeraX

Hi,

May I suggest having a default_fillvals option to xarray.open_dataset (and xarray.open_dataarray)?

My problem:

I have netcdf data containing flagged data, that is flagged with the netcdf default fill value of 9.96...e+36. But xarray (0.10.8) only masks arrays that have an explicit fill_value set:

import netCDF4, xarray, numpy

nc = netCDF4.Dataset('test.nc', 'w', format='NETCDF4')
nc.createDimension('x', 3)

var1 = nc.createVariable('var1', 'f8', ('x',))
var2 = nc.createVariable('var2', 'f8', ('x',), fill_value=netCDF4.default_fillvals['f8'])

var1[:] = numpy.array([0., 1., netCDF4.default_fillvals['f8']])
var2[:] = numpy.array([0., 1., netCDF4.default_fillvals['f8']])
print('netCDF4 var1', nc.variables['var1'][:])
print('netCDF4 var2', nc.variables['var2'][:])
nc.close()

ds = xarray.open_dataset('test.nc')
print('xarray var1', ds.var1[:])
print('xarray var2', ds.var2[:])

The problem is, that ds.var1 and ds.var2 are interpreted differently, although netCDF4 shows both as masked:

netCDF4 var1 [0.0 1.0 --]
netCDF4 var2 [0.0 1.0 --]
xarray var1 <xarray.DataArray 'var1' (x: 3)>
array([0.00000e+00, 1.00000e+00, 9.96921e+36])
Dimensions without coordinates: x
xarray var2 <xarray.DataArray 'var2' (x: 3)>
array([ 0.,  1., nan])
Dimensions without coordinates: x

I agree, that it is a good default, to mask data, only if the fill_value attribute is set. But I think it would be useful to be able to pass default_fill values to open_dataset to enable reading data, that uses the implicit default values.

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions