Skip to content

Unexpected type conversion in variables with _FillValue #6055

Closed
@jp-dark

Description

@jp-dark

What happened:
When opening a dataset with an int16 variable with the _FillValue attribute, the variable is converted from type int16 to float32. This was originally reported to the TileDB-CF-Py Git repo that contains a TileDB backend for xarray. See TileDB-CF-Py issue #117.

What you expected to happen:
I would expect the type to remain the same when applying the _FillValue.

Minimal Complete Verifiable Example:

Original example from TileDB-CF-Py issue #117 using the TileDB backend.

import tiledb
import xarray as xr
import numpy as np

index = tiledb.Dim(name='index', domain=(0, 3))
domain = tiledb.Domain(index)
var = tiledb.Attr(name='var', dtype=np.int16)
schema = tiledb.ArraySchema(domain=domain, attrs=[var], sparse=False)
tiledb.Array.create('dense_array0', schema)

with tiledb.open('dense_array0', 'w') as A:
    A[:] = np.array([5, 6, 7, 8], dtype=np.int16)

ds = xr.open_dataset('dense_array0', engine='tiledb')
ds['var'].dtype

NetCDF example with the same behavior:

import netCDF4
import xarray  as xr
import numpy as np

filename = 'temp_file.nc'
with netCDF4.Dataset(filename, mode="w") as group:
    group.createDimension("index", 4)
    var = group.createVariable("var", np.int16, ("index",), fill_value=-1)
    var[:] = np.array([5, 6, 7, 8], dtype=np.int16)
dataset = xr.open_dataset(filename)
dataset["var"].dtype

Anything else we need to know?:

  • I was able to verify the type conversion from int16 to float32 occurs in the conventions.decode_cf_variables call in the open_dataset method of StoreBackendEntrypoint.
  • I was able to verify the conversion does not happen if mask_and_scale=False.
  • Note that TileDB is automatically setting a fill value for all dense numerical arrays, and so we are always setting the _FillValue attribute for variables from the TileDB backend.

Environment:
I was able to reproduce this with both xarray 0.19.0 and 0.20.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions