handle default fill value

If a variable does not define a _FillValue value the 'default fill value' is normally used where data is masked. The default netCDF4 library does this by default and can be controlled with the set_auto_mask() function.

 For example loading a NetCDF with no explicit fill value set:

```python
In [92]: from netCDF4 import Dataset

In [93]: osd = Dataset('os150nb.nc', 'r')

In [94]: osd['u']
Out[94]:
<class 'netCDF4._netCDF4.Variable'>
float32 u(time, depth_cell)
    missing_value: 1e+38
    long_name: Zonal velocity component
    units: meter second-1
    C_format: %7.2f
    data_min: -0.6097069
    data_max: 0.6496426
unlimited dimensions:
current shape = (6830, 60)
filling on, default _FillValue of 9.969209968386869e+36 used

In [95]: u[1000]
Out[95]:
masked_array(data=[0.09373848885297775, 0.08173848688602448,
                   0.0697384923696518, 0.12273849546909332,
                   0.11573849618434906, 0.1387384980916977,
                   0.17173849046230316, 0.17673850059509277,
                   0.17673850059509277, 0.16373848915100098,
                   0.1857384890317917, 0.17673850059509277,
                   0.20173849165439606, 0.20973849296569824,
                   0.2037384957075119, 0.2297385036945343,
                   0.23273849487304688, 0.22873848676681519,
                   0.24073849618434906, 0.22873848676681519,
                   0.23073849081993103, 0.23273849487304688,
                   0.24973849952220917, 0.2467384934425354,
                   0.2207385003566742, 0.22773849964141846,
                   0.2387385070323944, 0.21473848819732666,
                   0.23973849415779114, 0.23673850297927856,
                   0.2517384886741638, 0.25273850560188293,
                   0.21973849833011627, 0.2387385070323944,
                   0.2207385003566742, 0.22373849153518677,
                   0.23473849892616272, 0.21073849499225616,
                   0.2247384935617447, --, --, --, --, --, --, --, --, --,
                   --, --, --, --, --, --, --, --, --, --, --, --],
             mask=[False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False,  True,
                    True,  True,  True,  True,  True,  True,  True,  True,
                    True,  True,  True,  True,  True,  True,  True,  True,
                    True,  True,  True,  True],
       fill_value=9.96921e+36,
            dtype=float32)
```

The resulting array is a masked array where missing values are masked. You can see the default fill value has been given in the variable output.

When loading the same NetCDF with xarray that fill value gets used where the values would be masked by NetCDF4.

``` python
In [107]: os150 = xr.open_dataset('os150nb.nc', decode_cf=True, mask_and_scale=True, decode_coords=True)
                                                                                                        
In [108]: os150.u[1000]
Out[108]:
<xarray.DataArray 'u' (depth_cell: 60)>
array([9.373849e-02, 8.173849e-02, 6.973849e-02, 1.227385e-01, 1.157385e-01,
       1.387385e-01, 1.717385e-01, 1.767385e-01, 1.767385e-01, 1.637385e-01,
       1.857385e-01, 1.767385e-01, 2.017385e-01, 2.097385e-01, 2.037385e-01,
       2.297385e-01, 2.327385e-01, 2.287385e-01, 2.407385e-01, 2.287385e-01,
       2.307385e-01, 2.327385e-01, 2.497385e-01, 2.467385e-01, 2.207385e-01,
       2.277385e-01, 2.387385e-01, 2.147385e-01, 2.397385e-01, 2.367385e-01,
       2.517385e-01, 2.527385e-01, 2.197385e-01, 2.387385e-01, 2.207385e-01,
       2.237385e-01, 2.347385e-01, 2.107385e-01, 2.247385e-01, 9.969210e+36,
       9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36,
       9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36,
       9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36,
       9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36, 9.969210e+36])
Coordinates:
    time     datetime64[ns] 2018-11-26T10:24:53.971200
Dimensions without coordinates: depth_cell
Attributes:
    long_name:  Zonal velocity component
    units:      meter second-1
    C_format:   %7.2f
    data_min:   -0.6097069
    data_max:   0.6496426
```

While this behaviour is correct in the sense that xarray has followed the NetCDF specification it's now no longer clear that those values were missing in the original NetCDF.

The attributes don't mention the fill value so even though this is outside the specified data range one could be forgiven for thinking that's the actual value in the DataArray. It's especially confusing when you've asked to have CF decoded and these values are still present. 

Further more if you look at the encoding for this DataArray you can see that it incorrectly states that the _FillVaule is the missing_value:

```python
In [136]: os150['u'].encoding
Out[136]:
{'source': 'C:\\Data\\adcp_processing\\in2018_v06\\postproc\\os150nb\\contour\\os150nb.nc',
 'original_shape': (6830, 60),
 '_FillValue': 1e+38,
 'dtype': dtype('float32')}
```

Unless I'm missing something I think this behaviour should be changed to either:
* Explicitly mention that the default fill value is being used in the DataArray attributes or have some other way of identifying it 
or
* Mask this value with nan/missing_vlaue in the resulting DataArray

Note that the NetCDF file I've used here isn't publicly available yet but I can add a link to it soon once it is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

handle default fill value #2742

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

handle default fill value #2742

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions