CesiumDB

A key-value store focused on performance.

Work In Progress

This project is an active work-in-progress.

It will likely compile, and most tests will likely pass, but it is not feature complete yet. The current state of work is stabilizing the embedded filesystem implementation so the front end memtables can rely on the backend embedded filesystem. Once that work is done, then it's just implementing levels (relatively easy) and compaction (easy enough).

Inspiration

This project was heavily inspired and influenced by (in no particular order):

Long compile times for Facebook's rocksdb
Howard Chu's lmdb
CockroachDB's pebble
Ben Johnson's boltdb
Google's leveldb
Giorgos Xanthakis et al's parallax
A burning desire to have a rust-native LSM-tree that has column family/namespace support

Interesting Features

It's ✨ FAST ✨ and has a few interesting features:

A blazingly fast hybrid logical clock (HLC) for ordering operations instead of MVCC semantics
A high-performance, lock-free, thread-safe, portable filesystem that works with block devices
An insanely fast bloom filter for fast lookups

How Fast is Fast?

I'm glad you asked! Here are some benchmarks:

Internal bloom filter lookups: ~860 picoseconds
Merge operator: ~115ms for a full table scan of 800,000 keys across 8 memtables

Usage

Add this to your Cargo.toml:

[dependencies]
cesiumdb = "1.0"

And use:

use cesiumdb::CesiumDB;

// use a temp file, most useful for testing
let db = CesiumDB::default ();

// no namespace
db.put(b"key", b"value");
db.get(b"key");

// with a namespace
db.put(1, b"key", b"value");
db.get(1, b"key");

See the API documentation for more information.

Namespaces are not Column Families

CesiumDB uses a construct I call "namespacing". It's a way for data of a similar type to be grouped together, but it is not stored separately than other namespaced data. Namespaces are ultimately glorified range markers to ensure fast data lookups across a large set of internal data, and a bit of a way to make it easy for users to manage their data. I would argue namespaces are closer to tables than column families.

Hybrid Logical Clocks

CesiumDB does let you bring your own hybrid logical clock implementation for key versioning. This is useful if you have a specific HLC implementation you want to use, or if you want to use a different clock entirely. This is done by implementing the HLC trait and passing it to the CesiumDB constructor. However, if you can provide a more precise clock than the provided one, please submit an issue or PR so we can all benefit from it.

Unsafety: Or... How To Do Dangerous Things Safely

There is a non-trivial amount of unsafe code. Most of it is related to the internal implementation with mmap (which cannot be made safe) and it's entrypoints (the handlers and such). I also make use of pointer arithmetic on memory-mapped file locations. This is one of the areas where safety comes at the cost of performance. However, if you can find a way to make it safe, please submit an issue or PR. I would love to see it!

There is ✨ EXTENSIVE ✨ testing around the unsafe code, and I am confident in its correctness. My goal is to keep this project at a high degree of code coverage with tests to help continue to ensure said confidence. However, if you find a bug, please submit an issue or PR.

Contributing

Contributions are welcome! Please submit a PR with your changes. If you're unsure about the changes, please submit an issue first.

To Do's

An alphabetical list of things I'd like to actually do for the long-term safety and stability of the project.

License

CesiumDB is licensed under GPL v3.0 with the Class Path Exception. This means you can safely link to CesiumDB in your project. So it's safe for corporate consumption, just not closed-source modification :simple_smile:

If you would like a non-GPL license, please reach out :simple_smile:

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.builds		.builds
.cargo		.cargo
.config		.config
.github/workflows		.github/workflows
.idea		.idea
benches		benches
src		src
.gitignore		.gitignore
COPYING		COPYING
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs
cbindgen.toml		cbindgen.toml
lima.yaml		lima.yaml
post.sh		post.sh
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml
sysinfo.py		sysinfo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CesiumDB

Work In Progress

Inspiration

Interesting Features

How Fast is Fast?

Usage

Namespaces are not Column Families

Hybrid Logical Clocks

Unsafety: Or... How To Do Dangerous Things Safely

Contributing

To Do's

License

About

Uh oh!

Uh oh!

Languages

License

siennathesane/cesiumdb

Folders and files

Latest commit

History

Repository files navigation

CesiumDB

Work In Progress

Inspiration

Interesting Features

How Fast is Fast?

Usage

Namespaces are not Column Families

Hybrid Logical Clocks

Unsafety: Or... How To Do Dangerous Things Safely

Contributing

To Do's

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages