Handle scale_factor and add_offset as scalar #4485

gerritholl · 2020-10-05T13:31:36Z

The h5netcdf engine exposes single-valued attributes as arrays of shape
(1,), which is correct according to the NetCDF standard, but may cause
a problem when reading a value of shape () before the scale_factor and
add_offset have been applied. This PR adds a check for the dimensionality
of add_offset and scale_factor and ensures they are scalar before they
are used for further processing, adds a unit test to verify that this
works correctly, and a note to the documentation to warn users of this
difference between the h5netcdf and netcdf4 engines.

Closes Numeric scalar variable attributes (including fill_value, scale_factor, add_offset) are 1-d instead of 0-d with h5netcdf engine, triggering ValueError: non-broadcastable output on application when loading single elements #4471
Tests added
Passes isort . && black . && mypy . && flake8
User visible changes (including notable bug fixes) are documented in whats-new.rst

The h5netcdf engine exposes single-valued attributes as arrays of shape (1,), which is correct according to the NetCDF standard, but may cause a problem when reading a value of shape () before the scale_factor and add_offset have been applied. This PR adds a check for the dimensionality of add_offset and scale_factor and ensures they are scalar before they are used for further processing, adds a unit test to verify that this works correctly, and a note to the documentation to warn users of this difference between the h5netcdf and netcdf4 engines. Fixes pydata#4471.

gerritholl · 2020-10-05T13:33:38Z

Is this bugfix notable enough to need a whats-new.rst entry?

For the unit test, I tried to construct an object that would emulate what is produced when reading a NetCDF4 file with the h5netcdf engine, but I gave up and settled for a temporary file instead. If this is an undesired approach, I could use some guidance in how to construct the appropriate object that will expose the problem.

dcherian

@gerritholl I think a whats-new entry would be appropriate.

xarray/tests/test_backends.py

shoyer · 2020-10-06T05:21:32Z

doc/io.rst

+There may be minor differences in the :py:class:`Dataset` object returned
+when reading a NetCDF file with different engines.  For example,
+single-valued attributes are returned as scalars by the default
+``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
+``engine=h5netcdf``.


I'm not 100% sure we need to mention this, I think it sort of goes without saying that different backends may differ in minor ways.

As a user who understands much less deeply how backends interact with xarray, it did surprise me. Should I keep this note or remove it?

It makes total sense but I wasn't aware either, so I'd leave it.

xarray/tests/test_backends.py

Add a whats-new entry for the fix to issue pydata#4471, corresponding to PR pydata#4485.

gerritholl · 2020-10-06T09:54:19Z

If this makes more sense as an integration test than as a unit test (for which I need help, see other comment), should I mark the current test in some way and/or move it to a different source file?

doc/io.rst

mathause · 2020-10-11T17:44:48Z

doc/io.rst

+There may be minor differences in the :py:class:`Dataset` object returned
+when reading a NetCDF file with different engines.  For example,
+single-valued attributes are returned as scalars by the default
+``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
+``engine=h5netcdf``.


It makes total sense but I wasn't aware either, so I'd leave it.

Co-authored-by: Mathias Hauser <[email protected]>

dcherian · 2020-10-11T20:06:26Z

Thanks @gerritholl

dcherian reviewed Oct 5, 2020

View reviewed changes

xarray/tests/test_backends.py Show resolved Hide resolved

shoyer reviewed Oct 6, 2020

View reviewed changes

DOC: Add whats-new entry for fixing 4471

75e9adc

Add a whats-new entry for the fix to issue pydata#4471, corresponding to PR pydata#4485.

dcherian approved these changes Oct 8, 2020

View reviewed changes

mathause approved these changes Oct 11, 2020

View reviewed changes

Update doc/io.rst

49d03d2

Co-authored-by: Mathias Hauser <[email protected]>

dcherian merged commit 569a4da into pydata:master Oct 11, 2020

gerritholl deleted the decode-with-array-attributes branch October 16, 2020 21:20

rabernat mentioned this pull request Dec 1, 2020

Decode_cf fails when scale_factor is a length-1 list #4631

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Handle scale_factor and add_offset as scalar #4485

Handle scale_factor and add_offset as scalar #4485

Uh oh!

gerritholl commented Oct 5, 2020 •

edited

Loading

Uh oh!

gerritholl commented Oct 5, 2020

Uh oh!

dcherian left a comment

Uh oh!

Uh oh!

shoyer Oct 6, 2020

Uh oh!

gerritholl Oct 6, 2020 •

edited

Loading

Uh oh!

mathause Oct 11, 2020

Uh oh!

Uh oh!

gerritholl commented Oct 6, 2020

Uh oh!

Uh oh!

mathause Oct 11, 2020

Uh oh!

dcherian commented Oct 11, 2020

Uh oh!

Uh oh!

Uh oh!

Handle scale_factor and add_offset as scalar #4485

Handle scale_factor and add_offset as scalar #4485

Uh oh!

Conversation

gerritholl commented Oct 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gerritholl commented Oct 5, 2020

Uh oh!

dcherian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shoyer Oct 6, 2020

Choose a reason for hiding this comment

Uh oh!

gerritholl Oct 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mathause Oct 11, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gerritholl commented Oct 6, 2020

Uh oh!

Uh oh!

mathause Oct 11, 2020

Choose a reason for hiding this comment

Uh oh!

dcherian commented Oct 11, 2020

Uh oh!

Uh oh!

gerritholl commented Oct 5, 2020 •

edited

Loading

gerritholl Oct 6, 2020 •

edited

Loading