-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Handle scale_factor and add_offset as scalar #4485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle scale_factor and add_offset as scalar #4485
Conversation
The h5netcdf engine exposes single-valued attributes as arrays of shape (1,), which is correct according to the NetCDF standard, but may cause a problem when reading a value of shape () before the scale_factor and add_offset have been applied. This PR adds a check for the dimensionality of add_offset and scale_factor and ensures they are scalar before they are used for further processing, adds a unit test to verify that this works correctly, and a note to the documentation to warn users of this difference between the h5netcdf and netcdf4 engines. Fixes pydata#4471.
Is this bugfix notable enough to need a For the unit test, I tried to construct an object that would emulate what is produced when reading a NetCDF4 file with the h5netcdf engine, but I gave up and settled for a temporary file instead. If this is an undesired approach, I could use some guidance in how to construct the appropriate object that will expose the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gerritholl I think a whats-new entry would be appropriate.
doc/io.rst
Outdated
There may be minor differences in the :py:class:`Dataset` object returned | ||
when reading a NetCDF file with different engines. For example, | ||
single-valued attributes are returned as scalars by the default | ||
``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with | ||
``engine=h5netcdf``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure we need to mention this, I think it sort of goes without saying that different backends may differ in minor ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user who understands much less deeply how backends interact with xarray, it did surprise me. Should I keep this note or remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes total sense but I wasn't aware either, so I'd leave it.
Add a whats-new entry for the fix to issue pydata#4471, corresponding to PR pydata#4485.
If this makes more sense as an integration test than as a unit test (for which I need help, see other comment), should I mark the current test in some way and/or move it to a different source file? |
doc/io.rst
Outdated
There may be minor differences in the :py:class:`Dataset` object returned | ||
when reading a NetCDF file with different engines. For example, | ||
single-valued attributes are returned as scalars by the default | ||
``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with | ||
``engine=h5netcdf``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes total sense but I wasn't aware either, so I'd leave it.
Co-authored-by: Mathias Hauser <[email protected]>
Thanks @gerritholl |
The h5netcdf engine exposes single-valued attributes as arrays of shape
(1,), which is correct according to the NetCDF standard, but may cause
a problem when reading a value of shape () before the scale_factor and
add_offset have been applied. This PR adds a check for the dimensionality
of add_offset and scale_factor and ensures they are scalar before they
are used for further processing, adds a unit test to verify that this
works correctly, and a note to the documentation to warn users of this
difference between the h5netcdf and netcdf4 engines.
isort . && black . && mypy . && flake8
whats-new.rst