Closed
Description
What happened:
When opening a dataset with an int16 variable with the _FillValue
attribute, the variable is converted from type int16 to float32. This was originally reported to the TileDB-CF-Py Git repo that contains a TileDB backend for xarray. See TileDB-CF-Py issue #117.
What you expected to happen:
I would expect the type to remain the same when applying the _FillValue.
Minimal Complete Verifiable Example:
Original example from TileDB-CF-Py issue #117 using the TileDB backend.
import tiledb
import xarray as xr
import numpy as np
index = tiledb.Dim(name='index', domain=(0, 3))
domain = tiledb.Domain(index)
var = tiledb.Attr(name='var', dtype=np.int16)
schema = tiledb.ArraySchema(domain=domain, attrs=[var], sparse=False)
tiledb.Array.create('dense_array0', schema)
with tiledb.open('dense_array0', 'w') as A:
A[:] = np.array([5, 6, 7, 8], dtype=np.int16)
ds = xr.open_dataset('dense_array0', engine='tiledb')
ds['var'].dtype
NetCDF example with the same behavior:
import netCDF4
import xarray as xr
import numpy as np
filename = 'temp_file.nc'
with netCDF4.Dataset(filename, mode="w") as group:
group.createDimension("index", 4)
var = group.createVariable("var", np.int16, ("index",), fill_value=-1)
var[:] = np.array([5, 6, 7, 8], dtype=np.int16)
dataset = xr.open_dataset(filename)
dataset["var"].dtype
Anything else we need to know?:
- I was able to verify the type conversion from int16 to float32 occurs in the
conventions.decode_cf_variables
call in theopen_dataset
method ofStoreBackendEntrypoint
. - I was able to verify the conversion does not happen if
mask_and_scale=False
. - Note that TileDB is automatically setting a fill value for all dense numerical arrays, and so we are always setting the
_FillValue
attribute for variables from the TileDB backend.
Environment:
I was able to reproduce this with both xarray 0.19.0 and 0.20.1