Skip to content

Handle scale_factor and add_offset as scalar #4485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 11, 2020

Conversation

gerritholl
Copy link
Contributor

@gerritholl gerritholl commented Oct 5, 2020

The h5netcdf engine exposes single-valued attributes as arrays of shape
(1,), which is correct according to the NetCDF standard, but may cause
a problem when reading a value of shape () before the scale_factor and
add_offset have been applied. This PR adds a check for the dimensionality
of add_offset and scale_factor and ensures they are scalar before they
are used for further processing, adds a unit test to verify that this
works correctly, and a note to the documentation to warn users of this
difference between the h5netcdf and netcdf4 engines.

The h5netcdf engine exposes single-valued attributes as arrays of shape
(1,), which is correct according to the NetCDF standard, but may cause
a problem when reading a value of shape () before the scale_factor and
add_offset have been applied.  This PR adds a check for the dimensionality
of add_offset and scale_factor and ensures they are scalar before they
are used for further processing, adds a unit test to verify that this
works correctly, and a note to the documentation to warn users of this
difference between the h5netcdf and netcdf4 engines.

Fixes pydata#4471.
@gerritholl
Copy link
Contributor Author

Is this bugfix notable enough to need a whats-new.rst entry?

For the unit test, I tried to construct an object that would emulate what is produced when reading a NetCDF4 file with the h5netcdf engine, but I gave up and settled for a temporary file instead. If this is an undesired approach, I could use some guidance in how to construct the appropriate object that will expose the problem.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerritholl I think a whats-new entry would be appropriate.

doc/io.rst Outdated
Comment on lines 108 to 112
There may be minor differences in the :py:class:`Dataset` object returned
when reading a NetCDF file with different engines. For example,
single-valued attributes are returned as scalars by the default
``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
``engine=h5netcdf``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure we need to mention this, I think it sort of goes without saying that different backends may differ in minor ways.

Copy link
Contributor Author

@gerritholl gerritholl Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user who understands much less deeply how backends interact with xarray, it did surprise me. Should I keep this note or remove it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes total sense but I wasn't aware either, so I'd leave it.

Add a whats-new entry for the fix to issue pydata#4471, corresponding to PR pydata#4485.
@gerritholl
Copy link
Contributor Author

If this makes more sense as an integration test than as a unit test (for which I need help, see other comment), should I mark the current test in some way and/or move it to a different source file?

doc/io.rst Outdated
Comment on lines 108 to 112
There may be minor differences in the :py:class:`Dataset` object returned
when reading a NetCDF file with different engines. For example,
single-valued attributes are returned as scalars by the default
``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
``engine=h5netcdf``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes total sense but I wasn't aware either, so I'd leave it.

Co-authored-by: Mathias Hauser <[email protected]>
@dcherian
Copy link
Contributor

Thanks @gerritholl

@dcherian dcherian merged commit 569a4da into pydata:master Oct 11, 2020
@gerritholl gerritholl deleted the decode-with-array-attributes branch October 16, 2020 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants