Skip to content

Clean up ObjectStore trait #385

Open
@crepererum

Description

@crepererum

Abstract

The ObjectStore trait -- as designed currently -- is a middle ground of somewhat competing design goals. I think we can do better than that.

Requirements

The trait serves two groups of API users:

Object Store Users

Humans and machines that want to use an object store through a unified interface.

They usually like to write only the relevant parts, e.g.:

  • "get object foo/bar.txt" ==> store.get("foo/bar.txt").await?
  • "get first 10 bytes of foo/bar.txt" ==> store.get("foo/bar.txt").with_range(..10).await?

Object Store Implementations

Parties that implement a new object store or wrap an existing one. They usually want to implement the methods that determine the behavior of the object store, ideally without surprises (like opinionated default implementations).

Status Quo

Due to this dual nature the trait has accumulated an increasing number of methods, a bunch of them with default implementation. Let's have a look at the "get" methods:

/// Return the bytes that are stored at the specified location.
async fn get(&self, location: &Path) -> Result<GetResult> {
self.get_opts(location, GetOptions::default()).await
}
/// Perform a get request with options
async fn get_opts(&self, location: &Path, options: GetOptions) -> Result<GetResult>;
/// Return the bytes that are stored at the specified location
/// in the given byte range.
///
/// See [`GetRange::Bounded`] for more details on how `range` gets interpreted
async fn get_range(&self, location: &Path, range: Range<u64>) -> Result<Bytes> {
let options = GetOptions {
range: Some(range.into()),
..Default::default()
};
self.get_opts(location, options).await?.bytes().await
}
/// Return the bytes that are stored at the specified location
/// in the given byte ranges
async fn get_ranges(&self, location: &Path, ranges: &[Range<u64>]) -> Result<Vec<Bytes>> {
coalesce_ranges(
ranges,
|range| self.get_range(location, range),
OBJECT_STORE_COALESCE_DEFAULT,
)
.await
}

All except for get_ranges basically map to get_opts and there is no reason for a store to override any of the defaults. And even for get_ranges we could come up with a sensible mapping to get_opts if the range parameter would support multi-ranges similar to HTTP.

Now let's look at "rename":

/// Move an object from one path to another in the same object store.
///
/// By default, this is implemented as a copy and then delete source. It may not
/// check when deleting source that it was the same object that was originally copied.
///
/// If there exists an object at the destination, it will be overwritten.
async fn rename(&self, from: &Path, to: &Path) -> Result<()> {
self.copy(from, to).await?;
self.delete(from).await
}

/// Move an object from one path to another in the same object store.
///
/// Will return an error if the destination already has an object.
async fn rename_if_not_exists(&self, from: &Path, to: &Path) -> Result<()> {
self.copy_if_not_exists(from, to).await?;
self.delete(from).await
}

I think it is out of question that a store implementation should definitely override these if there's any way to perform key renames without a full "get + put".

Proposal

I propose to remove all default implementations from the trait and only have a single method per operation, so "get" would look like this:

async fn get(&self, location: &Path, options: GetOptions) -> Result<GetResult>;

Note that the location is NOT part of options so that GetOptions can still implement Default and a user only needs to specify the options of interest:

// get bytes of file
store.get(
	&location,
	Default::default(),
).await?.bytes().await?;

// get range
store.get(
	&location,
	GetOptions{
		range: (..10).into(),
		..Default::default()
	},
).await?.bytes().await?;

A similar mapping can be done for "rename".

I think that strikes a good balance between boilerplate / verbosity of the user API and the clarity of the interface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions