Skip to content

Check shapes of coordinates and data during DataArray construction #2533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

lilyminium
Copy link
Contributor

@lilyminium lilyminium commented Oct 31, 2018

  • Closes numpy.insert on DataArray may silently result in array inconsistent with its coordinates  #2529 (remove if there is no corresponding issue, which should only be the case for minor changes)
  • Tests added (for all bug fixes or enhancements)
  • Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

This sets DataArrayGroupBy.reduce(shortcut=False), as the shortcut first constructs a DataArray with the previous coordinates and the new mutated data before updating the coordinates; this order of events now raises a ValueError.

@pep8speaks
Copy link

pep8speaks commented Oct 31, 2018

Hello @lilyminium! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-31 21:16:46 UTC

@shoyer
Copy link
Member

shoyer commented Nov 2, 2018

@lilyminium thanks for looking into this! I think the specific problem here is due to our use of the shortcut _replace method in DataArray.__array_wrap__ (which is called by NumPy, e.g., inside np.insert):

encoding : dict_like or None, optional

I think the clean way to fix this would be to add error checking to DataArray.__array_wrap__. But we currently use this method inside DataArray._unary_op (and maybe elsewhere?), which is called every time you do unwary math with an xarray object (e.g., - ). We’ll want to update these uses to use a method that uses the same shortcut to avoid redundant metadata checks.

@@ -230,6 +235,9 @@ def __init__(self, data, coords=None, dims=None, name=None,
# uncomment for a useful consistency check:
# assert all(isinstance(v, Variable) for v in coords.values())

# check shape consistency
_check_shape_consistency(variable.shape, coords, variable.dims)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be pretty expensive — it means we have to check shape consistency every time we create a DataArray, even if we know that the shape is already consistent, e.g., because it was created by another xarray operation.

@lilyminium lilyminium force-pushed the dataarray-consistency branch from a10d855 to f488bed Compare November 7, 2018 01:38
@lilyminium
Copy link
Contributor Author

I think the clean way to fix this would be to add error checking to DataArray.__array_wrap__. But we currently use this method inside DataArray._unary_op (and maybe elsewhere?), which is called every time you do unwary math with an xarray object (e.g., - ). We’ll want to update these uses to use a method that uses the same shortcut to avoid redundant metadata checks.

Does adding fastpath to skip shape checking in array_wrap work? fastpath=True in both DataArray.array_wrap and Variable.array_wrap didn't seem to break any tests.

@shoyer
Copy link
Member

shoyer commented Nov 7, 2018

Does adding fastpath to skip shape checking in array_wrap work? fastpath=True in both DataArray.array_wrap and Variable.array_wrap didn't seem to break any tests.

I think it would be slightly better to stick with a separate method for xarray's fast-path. __array_wrap__ is a special protocol used by NumPy, so there's always a (unlikely) risk that if we use different arguments in our version it could break in the future.

@max-sixty
Copy link
Collaborator

This would still be valuable if @lilyminium (or someone else, if they're not interested) would want to finish this off.

@lilyminium
Copy link
Contributor Author

Ooh, this is a blast from the past. Anyone passing by, feel free to pick this up if you like, otherwise I'll see what I can do over the weekend :)

@lilyminium lilyminium force-pushed the dataarray-consistency branch from b1cb248 to b755d1d Compare May 31, 2021 17:00
@lilyminium
Copy link
Contributor Author

Hmmmm. It's been quite some time since I've last used xarray, but I'm not sure I see an easy way to:

  1. hijack np.insert (which I would guess goes directly to DataArray.__array_wrap__ / __array_finalize__)
  2. without passing a fastpath-like argument to __array_wrap__, but
  3. efficiently not checking the shape of safe operations, like _unary_op.

I have for now stuck fastpath in _replace but it fails condition 3 and just runs for all calls to __array_wrap__.

@lilyminium lilyminium force-pushed the dataarray-consistency branch from b755d1d to b9c6b00 Compare May 31, 2021 17:04
@lilyminium lilyminium force-pushed the dataarray-consistency branch from b9c6b00 to bd1a76f Compare May 31, 2021 17:07
@lilyminium
Copy link
Contributor Author

The 3.7 test looks like an installation error.

@max-sixty
Copy link
Collaborator

Hmmmm. It's been quite some time since I've last used xarray, but I'm not sure I see an easy way to:

1. hijack np.insert (which I would guess goes directly to `DataArray.__array_wrap__ / __array_finalize__`)

2. without passing a `fastpath`-like argument to `__array_wrap__`, but

3. efficiently not checking the shape of safe operations, like `_unary_op`.

I think that's exactly correct. Without being able to tell __array_wrap__ from _unary_op that we don't need to check the sizes, they all follow the same path. Is it possible to use the context arg for this? I don't think it's sufficiently flexible from an initial glance of the docs.

Maybe this new implementation has a low enough overhead that it's worthwhile?

If we do go ahead — what about calling _check_shape_consistency from within __array_wrap__ after creating the object, instead of adding the fastpath kwarg?

@lilyminium lilyminium force-pushed the dataarray-consistency branch from 8d50ecc to e488efd Compare May 31, 2021 21:16
@lilyminium
Copy link
Contributor Author

context

I can only find references regarding ufuncs in the numpy docs when I look up the context. I would think that in that case the shapes should be fine, but please do correct me if not?

_unary_op calls __array_wrap__(context=None); if the function only checks the shape when context is None and a dummy value gets passed in for the context, that's one way to avoid it.

@max-sixty
Copy link
Collaborator

Nice, that seems to work!

Is anyone else more familiar with whether using context in this way is reasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

numpy.insert on DataArray may silently result in array inconsistent with its coordinates
4 participants