Description
Code Sample, a copy-pastable example if possible
Data file used is here: test.nc.zip
Output from each statement is commented out.
import xarray as xr
ds = xr.open_dataset('test.nc')
ds.cold_rad_cnts.min()
#13038.
ds.cold_rad_cnts.max()
#13143.
ds.cold_rad_cnts.mean()
#12640.583984
ds.cold_rad_cnts.std()
#455.035156
ds.cold_rad_cnts.sum()
#4.472997e+10
Problem description
As you can see above, the mean falls outside the range of the data, and the standard deviation is nearly two orders of magnitude higher than it should be. This is because a significant loss of precision is occurring when using bottleneck's nansum()
on data with a float32
dtype. I demonstrated this effect here: pydata/bottleneck#193.
Naturally, this means that converting the data to float64
or any int
dtype will give the correct result, as well as using numpy's built-in functions instead or uninstalling bottleneck. An example is shown below.
Expected Output
In [8]: import numpy as np
In [9]: np.nansum(ds.cold_rad_cnts)
Out[9]: 46357123000.0
In [10]: np.nanmean(ds.cold_rad_cnts)
Out[10]: 13100.413
In [11]: np.nanstd(ds.cold_rad_cnts)
Out[11]: 8.158843
Output of xr.show_versions()
xarray: 0.10.8
pandas: 0.23.4
numpy: 1.15.0
scipy: 1.1.0
netCDF4: 1.4.1
h5netcdf: 0.6.1
h5py: 2.8.0
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.18.2
distributed: 1.22.1
matplotlib: None
cartopy: None
seaborn: None
setuptools: 40.0.0
pip: 10.0.1
conda: None
pytest: None
IPython: 6.5.0
sphinx: None
Unfortunately this will probably not be fixed downstream anytime soon, so I think it would be nice if xarray provided some sort of automatic workaround for this rather than having to remember to manually convert my data if it's float32
. I am thinking making float64
the default (as discussed in #2304 ) would be nice but perhaps it might also be good if there was at least a warning whenever bottleneck's nansum()
is used on float32
arrays.