Description
Abstract
The ObjectStore
trait -- as designed currently -- is a middle ground of somewhat competing design goals. I think we can do better than that.
Requirements
The trait serves two groups of API users:
Object Store Users
Humans and machines that want to use an object store through a unified interface.
They usually like to write only the relevant parts, e.g.:
- "get object
foo/bar.txt
" ==>store.get("foo/bar.txt").await?
- "get first 10 bytes of
foo/bar.txt
" ==>store.get("foo/bar.txt").with_range(..10).await?
Object Store Implementations
Parties that implement a new object store or wrap an existing one. They usually want to implement the methods that determine the behavior of the object store, ideally without surprises (like opinionated default implementations).
Status Quo
Due to this dual nature the trait has accumulated an increasing number of methods, a bunch of them with default implementation. Let's have a look at the "get" methods:
arrow-rs-object-store/src/lib.rs
Lines 633 to 662 in 0c3152c
All except for get_ranges
basically map to get_opts
and there is no reason for a store to override any of the defaults. And even for get_ranges
we could come up with a sensible mapping to get_opts
if the range parameter would support multi-ranges similar to HTTP.
Now let's look at "rename":
arrow-rs-object-store/src/lib.rs
Lines 773 to 782 in 0c3152c
arrow-rs-object-store/src/lib.rs
Lines 793 to 799 in 0c3152c
I think it is out of question that a store implementation should definitely override these if there's any way to perform key renames without a full "get + put".
Proposal
I propose to remove all default implementations from the trait and only have a single method per operation, so "get" would look like this:
async fn get(&self, location: &Path, options: GetOptions) -> Result<GetResult>;
Note that the location
is NOT part of options
so that GetOptions
can still implement Default
and a user only needs to specify the options of interest:
// get bytes of file
store.get(
&location,
Default::default(),
).await?.bytes().await?;
// get range
store.get(
&location,
GetOptions{
range: (..10).into(),
..Default::default()
},
).await?.bytes().await?;
A similar mapping can be done for "rename".
I think that strikes a good balance between boilerplate / verbosity of the user API and the clarity of the interface.