Skip to content

Improve SQLite storage #293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion languages/tree-sitter-stack-graphs-typescript/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ clap = { version = "4", optional = true }
glob = "0.3"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
stack-graphs = { version = "0.11", path = "../../stack-graphs" }
stack-graphs = { version = ">=0.11, <=0.12", path = "../../stack-graphs" }
tree-sitter-stack-graphs = { version = "0.7", path = "../../tree-sitter-stack-graphs" }
tree-sitter-typescript = "0.20.2"
tsconfig = "0.1.0"
Expand Down
8 changes: 7 additions & 1 deletion stack-graphs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,19 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased
## v0.12.0 -- 2023-07-27

### Added

- New `SQLiteReader::clear` and `SQLiteReader::clear_paths` methods that make it easier to reuse instances.
- The method `SQLiteReader::load_graph_for_file` now returns the file handle for the loaded file.

### Changed

- The `Appendable` trait has been simplified. Its `Ctx` type parameter is gone, in favor of a separate trait `ToAppendable` that is used to find appendables for a handle. The type itself moved from the `cycles` to the `stitching` module.
- The `ForwardPartialPathStitcher` has been generalized so that it can be used to build paths from a database or from graph edges. It now takes a type parameter indicating the type of candidates it uses. Instead of a `Database` instance, it expects a value that implements the `Candidates` and `ToAppendable` traits. The `ForwardPartialPathStitcher::process_next_phase` expects an additional `extend_until` closure that controls whether the extended paths are considered for further extension or not (using `|_,_,_| true` retains old behavior).
- The SQLite database implementation is using a new schema which stores binary instead of JSON values, resulting in faster write times and smaller databases.
- Renamed method `SQLiteReader::load_graph_for_file_or_directory` to `SQLiteReader::load_graphs_for_file_or_directory`.

### Fixed

Expand Down
6 changes: 3 additions & 3 deletions stack-graphs/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "stack-graphs"
version = "0.11.0"
version = "0.12.0"
description = "Name binding for arbitrary programming languages"
homepage = "https://github.com/github/stack-graphs/tree/main/stack-graphs"
repository = "https://github.com/github/stack-graphs/"
Expand All @@ -15,7 +15,7 @@ edition = "2018"
[features]
copious-debugging = []
serde = ["dep:serde", "lsp-positions/serde"]
storage = ["rusqlite", "serde", "rmp-serde"]
storage = ["postcard", "rusqlite", "serde"]
visualization = ["serde", "serde_json"]

[lib]
Expand All @@ -31,7 +31,7 @@ fxhash = "0.2"
itertools = "0.10"
libc = "0.2"
lsp-positions = { version = "0.3", path = "../lsp-positions" }
rmp-serde = { version = "1.1", optional = true }
postcard = { version = "1", optional = true, features = ["use-std"] }
rusqlite = { version = "0.28", optional = true, features = ["bundled", "functions"] }
serde = { version = "1.0", optional = true, features = ["derive"] }
serde_json = { version = "1.0", optional = true }
Expand Down
2 changes: 1 addition & 1 deletion stack-graphs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ To use this library, add the following to your `Cargo.toml`:

``` toml
[dependencies]
stack-graphs = "0.11"
stack-graphs = "0.12"
```

Check out our [documentation](https://docs.rs/stack-graphs/) for more details on
Expand Down
16 changes: 16 additions & 0 deletions stack-graphs/src/arena.rs
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,14 @@ impl<T> Arena<T> {
}
}

/// Clear the arena, keeping underlying allocated capacity. After this, all previous handles into
/// the arena are invalid.
#[cfg_attr(not(feature = "storage"), allow(dead_code))]
#[inline(always)]
pub(crate) fn clear(&mut self) {
self.items.clear();
}

/// Adds a new instance to this arena, returning a stable handle to it.
///
/// Note that we do not deduplicate instances of `T` in any way. If you add two instances that
Expand Down Expand Up @@ -280,6 +288,14 @@ impl<H, T> SupplementalArena<H, T> {
}
}

/// Clear the supplemantal arena, keeping underlying allocated capacity. After this,
/// all previous handles into the arena are invalid.
#[cfg_attr(not(feature = "storage"), allow(dead_code))]
#[inline(always)]
pub(crate) fn clear(&mut self) {
self.items.clear();
}

/// Creates a new, empty supplemental arena, preallocating enough space to store supplemental
/// data for all of the instances that have already been allocated in a (regular) arena.
pub fn with_capacity(arena: &Arena<H>) -> SupplementalArena<H, T> {
Expand Down
7 changes: 7 additions & 0 deletions stack-graphs/src/partial.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2624,4 +2624,11 @@ impl PartialPaths {
partial_path_edges: Deque::new_arena(),
}
}

#[cfg_attr(not(feature = "storage"), allow(dead_code))]
pub(crate) fn clear(&mut self) {
self.partial_symbol_stacks.clear();
self.partial_scope_stacks.clear();
self.partial_path_edges.clear();
}
}
12 changes: 12 additions & 0 deletions stack-graphs/src/stitching.rs
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,18 @@ impl Database {
}
}

/// Clear the database. After this, all previous handles into the database are
/// invalid.
#[cfg_attr(not(feature = "storage"), allow(dead_code))]
pub(crate) fn clear(&mut self) {
self.partial_paths.clear();
self.local_nodes.clear();
self.symbol_stack_keys.clear();
self.symbol_stack_key_cache.clear();
self.paths_by_start_node.clear();
self.root_paths_by_precondition.clear();
}

/// Adds a partial path to this database. We do not deduplicate partial paths in any way; it's
/// your responsibility to only add each partial path once.
pub fn add_partial_path(
Expand Down
54 changes: 37 additions & 17 deletions stack-graphs/src/storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ use crate::stitching::ForwardPartialPathStitcher;
use crate::CancellationError;
use crate::CancellationFlag;

const VERSION: usize = 3;
const VERSION: usize = 4;

const SCHEMA: &str = r#"
CREATE TABLE metadata (
Expand Down Expand Up @@ -75,9 +75,7 @@ pub enum StorageError {
#[error(transparent)]
Serde(#[from] serde::Error),
#[error(transparent)]
RmpSerdeDecode(#[from] rmp_serde::decode::Error),
#[error(transparent)]
RmpSerdeEncode(#[from] rmp_serde::encode::Error),
PostcardError(#[from] postcard::Error),
}

pub type Result<T> = std::result::Result<T, StorageError>;
Expand Down Expand Up @@ -283,7 +281,7 @@ impl SQLiteWriter {
&file.to_string_lossy(),
tag,
error,
&rmp_serde::to_vec(&graph)?,
&postcard::to_stdvec(&graph)?,
))?;
Ok(())
}
Expand Down Expand Up @@ -323,7 +321,7 @@ impl SQLiteWriter {
let mut stmt =
conn.prepare_cached("INSERT INTO graphs (file, tag, value) VALUES (?, ?, ?)")?;
let graph = serde::StackGraph::from_graph_filter(graph, &FileFilter(file));
stmt.execute((file_str, tag, &rmp_serde::to_vec(&graph)?))?;
stmt.execute((file_str, tag, &postcard::to_stdvec(&graph)?))?;
Ok(())
}

Expand Down Expand Up @@ -364,7 +362,7 @@ impl SQLiteWriter {
);
let symbol_stack = path.symbol_stack_precondition.storage_key(graph, partials);
let path = serde::PartialPath::from_partial_path(graph, partials, path);
root_stmt.execute((file_str, symbol_stack, &rmp_serde::to_vec(&path)?))?;
root_stmt.execute((file_str, symbol_stack, &postcard::to_stdvec(&path)?))?;
root_path_count += 1;
} else if start_node.is_in_file(file) {
copious_debugging!(
Expand All @@ -375,7 +373,7 @@ impl SQLiteWriter {
node_stmt.execute((
file_str,
path.start_node.local_id,
&rmp_serde::to_vec(&path)?,
&postcard::to_stdvec(&path)?,
))?;
node_path_count += 1;
} else {
Expand Down Expand Up @@ -447,6 +445,28 @@ impl SQLiteReader {
})
}

/// Clear all data that has been loaded into this reader instance.
/// After this call, all existing handles from this reader are invalid.
pub fn clear(&mut self) {
self.loaded_graphs.clear();
self.graph = StackGraph::new();

self.loaded_node_paths.clear();
self.loaded_root_paths.clear();
self.partials.clear();
self.db.clear();
}

/// Clear path data that has been loaded into this reader instance.
/// After this call, all node handles remain valid, but all path data
/// is invalid.
pub fn clear_paths(&mut self) {
self.loaded_node_paths.clear();
self.loaded_root_paths.clear();
self.partials.clear();
self.db.clear();
}

/// Get the file's status in the database. If a tag is provided, it must match or the file
/// is reported missing.
pub fn status_for_file<T: AsRef<str>>(
Expand Down Expand Up @@ -485,7 +505,7 @@ impl SQLiteReader {
}

/// Ensure the graph for the given file is loaded.
pub fn load_graph_for_file(&mut self, file: &str) -> Result<()> {
pub fn load_graph_for_file(&mut self, file: &str) -> Result<Handle<File>> {
Self::load_graph_for_file_inner(file, &mut self.graph, &mut self.loaded_graphs, &self.conn)
}

Expand All @@ -494,21 +514,21 @@ impl SQLiteReader {
graph: &mut StackGraph,
loaded_graphs: &mut HashSet<String>,
conn: &Connection,
) -> Result<()> {
) -> Result<Handle<File>> {
copious_debugging!("--> Load graph for {}", file);
if !loaded_graphs.insert(file.to_string()) {
copious_debugging!(" * Already loaded");
return Ok(());
return Ok(graph.get_file(file).expect("loaded file to exist"));
}
copious_debugging!(" * Load from database");
let mut stmt = conn.prepare_cached("SELECT json FROM graphs WHERE file = ?")?;
let mut stmt = conn.prepare_cached("SELECT value FROM graphs WHERE file = ?")?;
let value = stmt.query_row([file], |row| row.get::<_, Vec<u8>>(0))?;
let file_graph = rmp_serde::from_slice::<serde::StackGraph>(&value)?;
let file_graph = postcard::from_bytes::<serde::StackGraph>(&value)?;
file_graph.load_into(graph)?;
Ok(())
Ok(graph.get_file(file).expect("loaded file to exist"))
}

pub fn load_graph_for_file_or_directory(
pub fn load_graphs_for_file_or_directory(
&mut self,
file_or_directory: &Path,
cancellation_flag: &dyn CancellationFlag,
Expand Down Expand Up @@ -559,7 +579,7 @@ impl SQLiteReader {
&mut self.loaded_graphs,
&self.conn,
)?;
let path = rmp_serde::from_slice::<serde::PartialPath>(&value)?;
let path = postcard::from_bytes::<serde::PartialPath>(&value)?;
let path = path.to_partial_path(&mut self.graph, &mut self.partials)?;
copious_debugging!(
" > Loaded {}",
Expand Down Expand Up @@ -613,7 +633,7 @@ impl SQLiteReader {
&mut self.loaded_graphs,
&self.conn,
)?;
let path = rmp_serde::from_slice::<serde::PartialPath>(&value)?;
let path = postcard::from_bytes::<serde::PartialPath>(&value)?;
let path = path.to_partial_path(&mut self.graph, &mut self.partials)?;
copious_debugging!(
" > Loaded {}",
Expand Down
4 changes: 4 additions & 0 deletions tree-sitter-stack-graphs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v0.7.1 -- 2023-07-27

Support `stack-graphs` version `0.12`.

## v0.7.0 -- 2023-06-08

### Library
Expand Down
4 changes: 2 additions & 2 deletions tree-sitter-stack-graphs/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "tree-sitter-stack-graphs"
version = "0.7.0"
version = "0.7.1"
description = "Create stack graphs using tree-sitter parsers"
homepage = "https://github.com/github/stack-graphs/tree/main/tree-sitter-stack-graphs"
repository = "https://github.com/github/stack-graphs/"
Expand Down Expand Up @@ -69,7 +69,7 @@ regex = "1"
rust-ini = "0.18"
serde_json = { version="1.0", optional=true }
sha1 = { version="0.10", optional=true }
stack-graphs = { version="0.11", path="../stack-graphs" }
stack-graphs = { version=">=0.11, <=0.12", path="../stack-graphs" }
thiserror = "1.0"
time = { version = "0.3", optional = true }
tokio = { version = "1.26", optional = true, features = ["io-std", "rt", "rt-multi-thread"] }
Expand Down
2 changes: 1 addition & 1 deletion tree-sitter-stack-graphs/src/cli/visualize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ impl VisualizeArgs {
let mut db = SQLiteReader::open(&db_path)?;
for source_path in &self.source_paths {
let source_path = source_path.canonicalize()?;
db.load_graph_for_file_or_directory(&source_path, cancellation_flag)?;
db.load_graphs_for_file_or_directory(&source_path, cancellation_flag)?;
}
let (graph, _, _) = db.get();
let starting_nodes = graph
Expand Down