Replies: 1 comment
-
This has been open long enough, we will start implementing this next year. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Move the ISISDATA and ISISTESTDATA store from a distributed download to a centralized data store with an HTTPS endpoint.
Motivation
To comply with efforts to transition USGS/NASA data to the cloud, we moved to Amazon S3 to host a subset of ISISDATA. S3 has many benefits for hosting static content on the cloud, but it comes at a price every time data is downloaded from S3. Therefore, we decided to leverage data hosted by other public sources provided by NAIF, JAXA, and ESA. This way, we can minimize costs to the USGS by not hosting redundant information. This came with some setbacks:
There have been attempts to rectify some of these problems:
We can more permanently rectify these issues by moving to a more centralized solution using what we learned about AWS in the previous implementation to both support easier downloads from a centralized source and keep costs down for USGS.
Proposed Solution / Explanation
Terms used in this explanation:
AWS solution for storing key/value pairs. Although it looks like a directory system, it is not a full-featured filesystem. It is useful when storing files in a publically accessible way using a structure similar to a filesystem. Amazon charges every time data is moved from S3 to some endpoint, like when downloading data.
S3 objects are stored in groups called buckets (e.g., ISISDATA is stored in a bucket called
isis_data
).AWS solution for creating a Content Delivery Network (CDN). A common use case is to cache an S3 Bucket. This allows for a fast HTTPS connection to an S3 bucket without paying on every download, only when the bucket is updated (e.g., ISISDATA public URL is updated to have new LRO kernels).
AWS Solution for hosting a shareable drive that has no maximum size as it grows elastically with the size of your data. Unlike S3 buckets, there is no easy way to expose it publically. This is useful for mounting internally to live services that need fast access to the data (e.g., SpiceServer).
How these components solve the problem.
I propose we update the process to do the following in order:
How this will impact ISIS users
Leverage existing USGS metadata used to search for kernels when running
spiceinit
or ALE'sisd_generate
to determine what kernels are used in the software. They are not included in the public bucket if not accessed in software to generate camera models or ISD. If we use SpiceQL's inventory system for this, it would also eliminate the need for duplicating kernels in different mission folders since SpiceQL's database is agnostic to filesytem structure.Other clients will work as you will no longer download directly from an S3 bucket. The script will be simplified but still distributed for users who have grown accustomed to using it.
Problems of files missing should occur less often.
Downloading ISISTESTDATA will have the same process as ISISDATA since they can be hosted in the same place.
To have kernels included in the system, an update to SpiceQL will be necessary.
In order for the kernel database to be updated, we would need a change to spiceql's configs.
Drawbacks
Alternatives
Unresolved Questions
Future Possibilities
Beta Was this translation helpful? Give feedback.
All reactions