Skip to content

segfault with a particular netcdf4 file #8289

Open
@hmaarrfk

Description

@hmaarrfk

What happened?

The following code yields a segfault on my machine (and many other machines with a similar environment)

import xarray
filename = 'tiny.nc.txt'
engine = "netcdf4"

dataset = xarray.open_dataset(filename, engine=engine)

i = 0
for i in range(60):
    xarray.open_dataset(filename, engine=engine)

tiny.nc.txt
mrc.nc.txt

What did you expect to happen?

Not to segfault.

Minimal Complete Verifiable Example

  1. Generate some netcdf4 with my application.
  2. Trim the netcdf4 file down (load it, and drop all the vars I can while still reproducing this bug)
  3. Try to read it.
import xarray
from tqdm import tqdm
filename = 'mrc.nc.txt'
engine = "h5netcdf"
dataset = xarray.open_dataset(filename, engine=engine)

for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"):
    xarray.open_dataset(filename, engine=engine)


engine = "netcdf4"

dataset = xarray.open_dataset(filename, engine=engine)
for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"):
    xarray.open_dataset(filename, engine=engine)

filename = 'tiny.nc.txt'

engine = "h5netcdf"
dataset = xarray.open_dataset(filename, engine=engine)
for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"):
    xarray.open_dataset(filename, engine=engine)


engine = "netcdf4"

dataset = xarray.open_dataset(filename, engine=engine)
for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"):
    xarray.open_dataset(filename, engine=engine)

hand crafting the file from start to finish seems to not segfault:

import xarray
import numpy as np
engine = 'netcdf4'

dataset = xarray.Dataset()

coords = {}
coords['image_x'] = np.arange(1, dtype='int')
dataset = dataset.assign_coords(coords)

dataset['image'] = xarray.DataArray(
    np.zeros((1,), dtype='uint8'),
    dims=('image_x',)
)

# %%
dataset.to_netcdf('mrc.nc.txt')
# %%
dataset = xarray.open_dataset('mrc.nc.txt', engine=engine)


for i in range(10):
    xarray.open_dataset('mrc.nc.txt', engine=engine)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

i=0 passes
i=1 mostly segfaults, but sometimes it can take more than 1 iteration

Anything else we need to know?

At first I thought it was deep in hdf5, but I am less convinced now

xref: HDFGroup/hdf5#3649

Environment

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by Ramona Optics | (main, Jun 27 2023, 02:59:09) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.1-060501-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2

xarray: 2023.9.1.dev25+g46643bb1.d20231009
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.3.0
distributed: 2023.3.0
matplotlib: 3.8.0
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: 0.22
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.2.1
conda: 23.7.4
pytest: 7.4.2
mypy: None
IPython: 8.16.1
sphinx: 7.2.6

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions