-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
added 'storage_transformers' to valid_encodings #7540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @JMorado! Would you mind adding a test that utilizes this encoding parameter?
Hi @jhamman, I have added a test and corrected a problem that I had previously missed. It turns out that a lock must be used to ensure the correct writing of a sharded zarr store. Everything seems to be working as expected now. Here are some comments:
Let me know what you think. |
It's great to see this PR get started in Xarray! Thanks @JMorado! From the perspective of a Zarr developer, the sharding feature is still highly experimental. The API may change significantly. While the sharding code is released in the sense that it is available deep in Zarr, it is not really considered part of the public API yet. So perhaps it's a bit too early to be doing this? |
Regarding locks, I think we need to think hard about the best way to deal with this across the stack. There are a couple of different options:
Note that there are still some deep inefficiencies in the way zarr-python writes shards (see zarr-developers/zarr-python#1338). I think we should be optimizing things at the Zarr level first, before implementing workarounds in Xarray. |
Does it makes sense to create a new backend in a new project to enable experimentation? |
This PR adds "storage_transformers" to the valid encodings of a variable, thus allowing Zarr stores to be written using the
new sharding storage transformers.
Example of usage: