Open
Description
My vision for this package is that would work seamlessly in cooperation with a local and/or remote high performance data catalog and store (i.e. data engine). Presently, the Icechunk cloud-native transactional tensor storage engine is the most promising option, as it was recently open-sourced by EarthMover as the source code behind their ArrayLake services.
An ideal work flow would be to:
- User requests a dataset from a well-known data repository for a specific area of interest.
- These well-known data repos will be cataloged here in a yaml file, and optionally referenced with Kerchunk or VirtualiZarr.
- This package first checks if the specific dataset has already been fetched and saved to a local Icechunk instance.
- If not, it fetches the specific dataset from the source repository, saving it locally in it's native format.
- If the user expects to reuse the data, they can choose to convert the dataset into a cloud-optimized, analysis-ready (ARCO) zarr3 dataset within Icechunk.
Metadata
Metadata
Assignees
Labels
No labels