Open
Description
Hi,
May I suggest having a default_fillvals option to xarray.open_dataset (and xarray.open_dataarray)?
My problem:
I have netcdf data containing flagged data, that is flagged with the netcdf default fill value of 9.96...e+36. But xarray (0.10.8) only masks arrays that have an explicit fill_value set:
import netCDF4, xarray, numpy
nc = netCDF4.Dataset('test.nc', 'w', format='NETCDF4')
nc.createDimension('x', 3)
var1 = nc.createVariable('var1', 'f8', ('x',))
var2 = nc.createVariable('var2', 'f8', ('x',), fill_value=netCDF4.default_fillvals['f8'])
var1[:] = numpy.array([0., 1., netCDF4.default_fillvals['f8']])
var2[:] = numpy.array([0., 1., netCDF4.default_fillvals['f8']])
print('netCDF4 var1', nc.variables['var1'][:])
print('netCDF4 var2', nc.variables['var2'][:])
nc.close()
ds = xarray.open_dataset('test.nc')
print('xarray var1', ds.var1[:])
print('xarray var2', ds.var2[:])
The problem is, that ds.var1 and ds.var2 are interpreted differently, although netCDF4 shows both as masked:
netCDF4 var1 [0.0 1.0 --]
netCDF4 var2 [0.0 1.0 --]
xarray var1 <xarray.DataArray 'var1' (x: 3)>
array([0.00000e+00, 1.00000e+00, 9.96921e+36])
Dimensions without coordinates: x
xarray var2 <xarray.DataArray 'var2' (x: 3)>
array([ 0., 1., nan])
Dimensions without coordinates: x
I agree, that it is a good default, to mask data, only if the fill_value attribute is set. But I think it would be useful to be able to pass default_fill values to open_dataset to enable reading data, that uses the implicit default values.
What do you think?