Open
Description
What happened?
The following code yields a segfault on my machine (and many other machines with a similar environment)
import xarray
filename = 'tiny.nc.txt'
engine = "netcdf4"
dataset = xarray.open_dataset(filename, engine=engine)
i = 0
for i in range(60):
xarray.open_dataset(filename, engine=engine)
What did you expect to happen?
Not to segfault.
Minimal Complete Verifiable Example
- Generate some netcdf4 with my application.
- Trim the netcdf4 file down (load it, and drop all the vars I can while still reproducing this bug)
- Try to read it.
import xarray
from tqdm import tqdm
filename = 'mrc.nc.txt'
engine = "h5netcdf"
dataset = xarray.open_dataset(filename, engine=engine)
for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"):
xarray.open_dataset(filename, engine=engine)
engine = "netcdf4"
dataset = xarray.open_dataset(filename, engine=engine)
for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"):
xarray.open_dataset(filename, engine=engine)
filename = 'tiny.nc.txt'
engine = "h5netcdf"
dataset = xarray.open_dataset(filename, engine=engine)
for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"):
xarray.open_dataset(filename, engine=engine)
engine = "netcdf4"
dataset = xarray.open_dataset(filename, engine=engine)
for i in tqdm(range(60), desc=f"filename={filename}, enine={engine}"):
xarray.open_dataset(filename, engine=engine)
hand crafting the file from start to finish seems to not segfault:
import xarray
import numpy as np
engine = 'netcdf4'
dataset = xarray.Dataset()
coords = {}
coords['image_x'] = np.arange(1, dtype='int')
dataset = dataset.assign_coords(coords)
dataset['image'] = xarray.DataArray(
np.zeros((1,), dtype='uint8'),
dims=('image_x',)
)
# %%
dataset.to_netcdf('mrc.nc.txt')
# %%
dataset = xarray.open_dataset('mrc.nc.txt', engine=engine)
for i in range(10):
xarray.open_dataset('mrc.nc.txt', engine=engine)
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
i=0 passes
i=1 mostly segfaults, but sometimes it can take more than 1 iteration
Anything else we need to know?
At first I thought it was deep in hdf5, but I am less convinced now
xref: HDFGroup/hdf5#3649
Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by Ramona Optics | (main, Jun 27 2023, 02:59:09) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 6.5.1-060501-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.9.1.dev25+g46643bb1.d20231009
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.3.0
distributed: 2023.3.0
matplotlib: 3.8.0
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: 0.22
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.2.1
conda: 23.7.4
pytest: 7.4.2
mypy: None
IPython: 8.16.1
sphinx: 7.2.6