| 📍 NOTE |
|---|
| RubyGems (the GitHub org, not the website) suffered a hostile takeover in September 2025. |
| Ultimately 4 maintainers were hard removed and a reason has been given for only 1 of those, while 2 others resigned in protest. |
| It is a complicated story which is difficult to parse quickly. |
| Simply put - there was active policy for adding or removing maintainers/owners of rubygems and bundler, and those policies were not followed. |
| I'm adding notes like this to gems because I don't condone theft of repositories or gems from their rightful owners. |
| If a similar theft happened with my repos/gems, I'd hope some would stand up for me. |
| Disenfranchised former-maintainers have started gem.coop. |
| Once available I will publish there exclusively; unless RubyCentral makes amends with the community. |
| The "Technology for Humans: Joel Draper" podcast episode by reinteractive is the most cogent summary I'm aware of. |
| See here, here and here for more info on what comes next. |
| What I'm doing: A (WIP) proposal for bundler/gem scopes, and a (WIP) proposal for a federated gem server. |
if ci_badges.map(&:color).detect { it != "green"} ☝️ let me know, as I may have missed the discord notification.
if ci_badges.map(&:color).all? { it == "green"} 👇️ send money so I can do more of this. FLOSS maintenance is now my full-time job.
TreeHaver is a cross-Ruby adapter for the tree-sitter and Citrus parsing libraries and other dedicated parsing tools that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using grammars, regardless of your Ruby implementation.
If you've used Faraday, multi_json, or multi_xml, you'll feel right at home with TreeHaver. These gems share a common philosophy:
| Gem | Unified API for | Backend Examples |
|---|---|---|
| Faraday | HTTP requests | Net::HTTP, Typhoeus, Patron, Excon |
| multi_json | JSON parsing | Oj, Yajl, JSON gem |
| multi_xml | XML parsing | Nokogiri, LibXML, Ox |
| TreeHaver | Code parsing | MRI, Rust, FFI, Java, Prism, Psych, Commonmarker, Markly, Citrus (& Co.) |
Write once, run anywhere.
Learn once, write anywhere.
Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.
# Your code stays the same regardless of backend
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.from_library("/path/to/grammar.so")
tree = parser.parse(source_code)
# TreeHaver automatically picks the best backend:
# - MRI → ruby_tree_sitter (C extensions)
# - JRuby → FFI (system's libtree-sitter)
# - TruffleRuby → FFI or MRI backend- Universal Ruby Support: Works on MRI Ruby, JRuby, and TruffleRuby
- 10 Parsing Backends - Choose the right backend for your needs:
- Tree-sitter Backends (high-performance, incremental parsing):
- MRI Backend: Leverages
ruby_tree_sittergem (C extension, fastest on MRI) - Rust Backend: Uses
tree_stumpgem (Rust with precompiled binaries)- Note:
tree_stumpcurrently requires unreleased fixes in themainbranch.
- Note:
- FFI Backend: Pure Ruby FFI bindings to
libtree-sitter(ideal for JRuby, TruffleRuby) - Java Backend: Native Java integration for JRuby with
java-tree-sitter/jtreesittergrammar JARs
- MRI Backend: Leverages
- Language-Specific Backends (native parser integration):
- Prism Backend: Ruby's official parser (Prism, stdlib in Ruby 3.4+)
- Psych Backend: Ruby's YAML parser (Psych, stdlib)
- Commonmarker Backend: Fast Markdown parser (Commonmarker, comrak Rust)
- Markly Backend: GitHub Flavored Markdown (Markly, cmark-gfm C)
- Pure Ruby Fallback:
- Citrus Backend: Pure Ruby parsing via
citrus(no native dependencies)
- Citrus Backend: Pure Ruby parsing via
- Tree-sitter Backends (high-performance, incremental parsing):
- Automatic Backend Selection: Intelligently selects the best backend for your Ruby implementation
- Language Agnostic: Parse any language - Ruby, Markdown, YAML, JSON, Bash, TOML, JavaScript, etc.
- Grammar Discovery: Built-in
GrammarFinderutility for platform-aware grammar library discovery - Unified Position API: Consistent
start_line,end_line,source_positionacross all backends - Thread-Safe: Built-in language registry with thread-safe caching
- Minimal API Surface: Simple, focused API that covers the most common use cases
TreeHaver has minimal dependencies and automatically selects the best backend for your Ruby implementation. Each backend has specific version requirements:
Requires ruby_tree_sitter v2.0+
In ruby_tree_sitter v2.0, all TreeSitter exceptions were changed to inherit from Exception (not StandardError). This was an intentional breaking change made for thread-safety and signal handling reasons.
Exception Mapping: TreeHaver catches TreeSitter::TreeSitterError and its subclasses, converting them to TreeHaver::NotAvailable while preserving the original error message. This provides a consistent exception API across all backends:
| ruby_tree_sitter Exception | TreeHaver Exception | When It Occurs |
|---|---|---|
TreeSitter::ParserNotFoundError |
TreeHaver::NotAvailable |
Parser library file cannot be loaded |
TreeSitter::LanguageLoadError |
TreeHaver::NotAvailable |
Language symbol loads but returns nothing |
TreeSitter::SymbolNotFoundError |
TreeHaver::NotAvailable |
Symbol not found in library |
TreeSitter::ParserVersionError |
TreeHaver::NotAvailable |
Parser version incompatible with tree-sitter |
TreeSitter::QueryCreationError |
TreeHaver::NotAvailable |
Query creation fails |
# Add to your Gemfile for MRI backend
gem "ruby_tree_sitter", "~> 2.0"NOTE: tree_stump currently requires unreleased fixes in the main branch.
# Add to your Gemfile for Rust backend
gem "tree_stump", github: "joker1007/tree_stump", branch: "main"Requires the ffi gem and a system installation of libtree-sitter:
# Add to your Gemfile for FFI backend
gem "ffi", ">= 1.15", "< 2.0"# Install libtree-sitter on your system:
# macOS
brew install tree-sitter
# Ubuntu/Debian
apt-get install libtree-sitter0 libtree-sitter-dev
# Fedora
dnf install tree-sitter tree-sitter-develPure Ruby parser with no native dependencies:
# Add to your Gemfile for Citrus backend
gem "citrus", "~> 3.0"No additional dependencies required beyond grammar JARs built for java-tree-sitter / jtreesitter.
tree-sitter is a powerful parser generator that creates incremental parsers for many programming languages. However, integrating it into Ruby applications can be challenging:
- MRI-based C extensions don't work on JRuby
- FFI-based solutions may not be optimal for MRI
- Managing different backends for different Ruby implementations is cumbersome
TreeHaver solves these problems by providing a unified API that automatically selects the appropriate backend for your Ruby implementation, allowing you to write code once and run it anywhere.
The *-merge gem family provides intelligent, AST-based merging for various file formats. At the foundation is tree_haver, which provides a unified cross-Ruby parsing API that works seamlessly across MRI, JRuby, and TruffleRuby.
| Gem | Format | Parser Backend(s) | Description |
|---|---|---|---|
| tree_haver | Multi | MRI C, Rust, FFI, Java, Prism, Psych, Commonmarker, Markly, Citrus | Foundation: Cross-Ruby adapter for parsing libraries (like Faraday for HTTP) |
| ast-merge | Text | internal | Infrastructure: Shared base classes and merge logic for all *-merge gems |
| prism-merge | Ruby | Prism | Smart merge for Ruby source files |
| psych-merge | YAML | Psych | Smart merge for YAML files |
| json-merge | JSON | tree-sitter-json (via tree_haver) | Smart merge for JSON files |
| jsonc-merge | JSONC | tree-sitter-jsonc (via tree_haver) | |
| bash-merge | Bash | tree-sitter-bash (via tree_haver) | Smart merge for Bash scripts |
| rbs-merge | RBS | RBS | Smart merge for Ruby type signatures |
| dotenv-merge | Dotenv | internal | Smart merge for .env files |
| toml-merge | TOML | Citrus + toml-rb (default, via tree_haver), tree-sitter-toml (via tree_haver) | Smart merge for TOML files |
| markdown-merge | Markdown | Commonmarker / Markly (via tree_haver) | Foundation: Shared base for Markdown mergers with inner code block merging |
| markly-merge | Markdown | Markly (via tree_haver) | Smart merge for Markdown (CommonMark via cmark-gfm C) |
| commonmarker-merge | Markdown | Commonmarker (via tree_haver) | Smart merge for Markdown (CommonMark via comrak Rust) |
Example implementations for the gem templating use case:
| Gem | Purpose | Description |
|---|---|---|
| kettle-dev | Gem Development | Gem templating tool using *-merge gems |
| kettle-jem | Gem Templating | Gem template library with smart merge support |
| Feature | tree_haver (this gem) | ruby_tree_sitter | tree_stump | citrus |
|---|---|---|---|---|
| MRI Ruby | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| JRuby | ✅ Yes (FFI, Java, or Citrus backend) | ❌ No | ❌ No | ✅ Yes |
| TruffleRuby | ✅ Yes (FFI or Citrus) | ❌ No | ❓ Unknown | ✅ Yes |
| Backend | Multi (MRI C, Rust, FFI, Java, Citrus) | C extension only | Rust extension | Pure Ruby |
| Incremental Parsing | ✅ Via MRI C/Rust/Java backend | ✅ Yes | ✅ Yes | ❌ No |
| Query API | ⚡ Via MRI/Rust/Java backend | ✅ Yes | ✅ Yes | ❌ No |
| Grammar Discovery | ✅ Built-in GrammarFinder |
❌ Manual | ❌ Manual | ❌ Manual |
| Security Validations | ✅ PathValidator |
❌ No | ❌ No | ❌ No |
| Language Registration | ✅ Thread-safe registry | ❌ No | ❌ No | ❌ No |
| Native Performance | ⚡ Backend-dependent | ✅ Native C | ✅ Native Rust | ❌ Pure Ruby |
| Precompiled Binaries | ⚡ Via Rust backend | ✅ Yes | ✅ Yes | ✅ Pure Ruby |
| Zero Native Deps | ⚡ Via Citrus backend | ❌ No | ❌ No | ✅ Yes |
| Minimum Ruby | 3.2+ | 3.0+ | 3.1+ | 0+ |
Note: Java backend works with grammar JARs built specifically for java-tree-sitter / jtreesitter, or grammar .so files that statically link tree-sitter. This is why FFI is recommended for JRuby & TruffleRuby.
Note: TreeHaver can use ruby_tree_sitter (MRI) or tree_stump (MRI, JRuby?) as backends, or java-tree-sitter (docs, maven, source, JRuby), or FFI on any backend, giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.
Note: tree_stump currently requires unreleased fixes in the main branch.
Choose TreeHaver when:
- You need JRuby or TruffleRuby support
- You're building a library that should work across Ruby implementations
- You want automatic grammar discovery and security validations
- You want flexibility to switch backends without code changes
- You need incremental parsing with a unified API
Choose ruby_tree_sitter directly when:
- You only target MRI Ruby
- You need the full Query API without abstraction
- You want the most battle-tested C bindings
- You don't need TreeHaver's grammar discovery
Choose tree_stump directly when:
- You only target MRI Ruby
- You prefer Rust-based native extensions
- You want precompiled binaries without system dependencies
- You don't need TreeHaver's grammar discovery
- Note:
tree_stumpcurrently requires unreleased fixes in themainbranch.
Choose citrus directly when:
- You need zero native dependencies (pure Ruby)
- You're using a Citrus grammar (not tree-sitter grammars)
- Performance is less critical than portability
- You don't need TreeHaver's unified API
| Tokens to Remember | |
|---|---|
| Works with JRuby | |
| Works with Truffle Ruby | |
| Works with MRI Ruby 3 | |
| Support & Community | |
| Source | |
| Documentation | |
| Compliance | |
| Style | |
| Maintainer 🎖️ | |
... 💖 |
Compatible with MRI Ruby 3.2.0+, and concordant releases of JRuby, and TruffleRuby.
| 🚚 Amazing test matrix was brought to you by | 🔎 appraisal2 🔎 and the color 💚 green 💚 |
|---|---|
| 👟 Check it out! | ✨ github.com/appraisal-rb/appraisal2 ✨ |
Find this repo on federated forges (Coming soon!)
| Federated DVCS Repository | Status | Issues | PRs | Wiki | CI | Discussions |
|---|---|---|---|---|---|---|
| 🧪 kettle-rb/tree_haver on GitLab | The Truth | 💚 | 💚 | 💚 | 🐭 Tiny Matrix | ➖ |
| 🧊 kettle-rb/tree_haver on CodeBerg | An Ethical Mirror (Donate) | 💚 | 💚 | ➖ | ⭕️ No Matrix | ➖ |
| 🐙 kettle-rb/tree_haver on GitHub | Another Mirror | 💚 | 💚 | 💚 | 💯 Full Matrix | 💚 |
| 🎮️ Discord Server | Let's | talk | about | this | library! |
Available as part of the Tidelift Subscription.
Need enterprise-level guarantees?
The maintainers of this and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source packages you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact packages you use.
- 💡Subscribe for support guarantees covering all your FLOSS dependencies
- 💡Tidelift is part of Sonar
- 💡Tidelift pays maintainers to maintain the software you depend on!
📊@Pointy Haired Boss: An enterprise support subscription is "never gonna let you down", and supports open source maintainers
Alternatively:
Install the gem and add to the application's Gemfile by executing:
bundle add tree_haverIf bundler is not being used to manage dependencies, install the gem by executing:
gem install tree_haverFor Medium or High Security Installations
This gem is cryptographically signed, and has verifiable SHA-256 and SHA-512 checksums by stone_checksums. Be sure the gem you install hasn’t been tampered with by following the instructions below.
Add my public key (if you haven’t already, expires 2045-04-29) as a trusted certificate:
gem cert --add <(curl -Ls https://raw.github.com/galtzo-floss/certs/main/pboling.pem)You only need to do that once. Then proceed to install with:
gem install tree_haver -P HighSecurityThe HighSecurity trust profile will verify signed gems, and not allow the installation of unsigned dependencies.
If you want to up your security game full-time:
bundle config set --global trust-policy MediumSecurityMediumSecurity instead of HighSecurity is necessary if not all the gems you use are signed.
NOTE: Be prepared to track down certs for signed gems and add them the same way you added mine.
TreeHaver supports 10 parsing backends, each with different trade-offs. The auto backend automatically selects the best available option.
| Backend | Description | Performance | Portability | Examples |
|---|---|---|---|---|
| Auto | Auto-selects best backend | Varies | ✅ Universal | JSON · JSONC · Bash · TOML |
| MRI | C extension via ruby_tree_sitter | ⚡ Fastest | MRI only | JSON · JSONC · |
| Rust | Precompiled via tree_stump | ⚡ Very Fast | ✅ Good | JSON · JSONC · |
| FFI | Dynamic linking via FFI | 🔵 Fast | ✅ Universal | JSON · JSONC · Bash · TOML |
| Java | JNI bindings | ⚡ Very Fast | JRuby only | JSON · JSONC · Bash · TOML |
| Backend | Description | Performance | Portability | Examples |
|---|---|---|---|---|
| Prism | Ruby's official parser | ⚡ Very Fast | ✅ Universal | Ruby |
| Psych | Ruby's YAML parser (stdlib) | ⚡ Very Fast | ✅ Universal | YAML |
| Commonmarker | Markdown via comrak (Rust) | ⚡ Very Fast | ✅ Good | Markdown · Merge |
| Markly | GFM via cmark-gfm (C) | ⚡ Very Fast | ✅ Good | Markdown · Merge |
| Citrus | Pure Ruby parsing | 🟡 Slower | ✅ Universal | TOML · Finitio · Dhall |
Selection Priority (Auto mode): MRI → Rust → FFI → Java → Prism → Psych → Commonmarker → Markly → Citrus
Known Issues:
- *MRI + Bash: ABI incompatibility (use FFI instead)
- *Rust + Bash: Version mismatch (use FFI instead)
Backend Requirements:
# Tree-sitter backends
gem "ruby_tree_sitter", "~> 2.0" # MRI backend
gem "tree_stump" # Rust backend
gem "ffi", ">= 1.15", "< 2.0" # FFI backend
# Java backend: no gem required (uses JRuby's built-in JNI)
# Language-specific backends
gem "prism", "~> 1.0" # Ruby parsing (stdlib in Ruby 3.4+)
# Psych: no gem required (Ruby stdlib)
gem "commonmarker", ">= 0.23" # Markdown parsing (comrak)
gem "markly", "~> 0.11" # GFM parsing (cmark-gfm)
# Pure Ruby fallback
gem "citrus", "~> 3.0" # Citrus backend
# Plus grammar gems: toml-rb, dhall, finitio, etc.Force Specific Backend:
# Tree-sitter backends
TreeHaver.backend = :mri # Force MRI backend (ruby_tree_sitter)
TreeHaver.backend = :rust # Force Rust backend (tree_stump)
TreeHaver.backend = :ffi # Force FFI backend
TreeHaver.backend = :java # Force Java backend (JRuby only)
# Language-specific backends
TreeHaver.backend = :prism # Force Prism (Ruby parsing)
TreeHaver.backend = :psych # Force Psych (YAML parsing)
TreeHaver.backend = :commonmarker # Force Commonmarker (Markdown)
TreeHaver.backend = :markly # Force Markly (GFM Markdown)
# Pure Ruby fallback
TreeHaver.backend = :citrus # Force Citrus backend
# Auto-selection (default)
TreeHaver.backend = :auto # Let TreeHaver chooseBlock-based Backend Switching:
Use with_backend to temporarily switch backends for a specific block of code.
This is thread-safe and supports nesting—the previous backend is automatically
restored when the block exits (even if an exception is raised).
# Temporarily use a specific backend
TreeHaver.with_backend(:mri) do
parser = TreeHaver::Parser.new
tree = parser.parse(source)
# All operations in this block use the MRI backend
end
# Backend is restored to its previous value here
# Nested blocks work correctly
TreeHaver.with_backend(:rust) do
# Uses :rust
TreeHaver.with_backend(:citrus) do
# Uses :citrus
parser = TreeHaver::Parser.new
end
# Back to :rust
end
# Back to original backendThis is particularly useful for:
- Testing: Test the same code with different backends
- Performance comparison: Benchmark different backends
- Fallback scenarios: Try one backend, fall back to another
- Thread isolation: Each thread can use a different backend safely
# Example: Testing with multiple backends
[:mri, :rust, :citrus].each do |backend_name|
TreeHaver.with_backend(backend_name) do
parser = TreeHaver::Parser.new
result = parser.parse(source)
puts "#{backend_name}: #{result.root_node.type}"
end
endCheck Backend Capabilities:
TreeHaver.backend # => :ffi
TreeHaver.backend_module # => TreeHaver::Backends::FFI
TreeHaver.capabilities # => { backend: :ffi, parse: true, query: false, ... }See examples/ directory for 26 complete working examples demonstrating all 10 backends with multiple languages (JSON, JSONC, Bash, TOML, Ruby, YAML, Markdown) plus markdown-merge integration examples.
TreeHaver provides defense-in-depth validations, but you should understand the risks:
TreeHaver's PathValidator module protects against:
- Path traversal: Paths containing
/../or/./are rejected - Null byte injection: Paths containing null bytes are rejected
- Non-absolute paths: Relative paths are rejected to prevent CWD-based attacks
- Invalid extensions: Only
.so,.dylib, and.dllfiles are accepted - Malicious filenames: Filenames must match a safe pattern (alphanumeric, hyphens, underscores)
- Invalid language names: Language names must be lowercase alphanumeric with underscores
- Invalid symbol names: Symbol names must be valid C identifiers
# Standard usage - paths from ENV are validated
finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path # Validates ENV path before returning
# Maximum security - only trusted system directories
path = finder.find_library_path_safe # Ignores ENV, only /usr/lib etc.
# Manual validation
if TreeHaver::PathValidator.safe_library_path?(user_provided_path)
language = TreeHaver::Language.from_library(user_provided_path)
end
# Get validation errors for debugging
errors = TreeHaver::PathValidator.validation_errors(path)
# => ["Path is not absolute", "Path contains traversal sequence"]The find_library_path_safe method only returns paths in trusted directories.
Default trusted directories:
/usr/lib,/usr/lib64/usr/lib/x86_64-linux-gnu,/usr/lib/aarch64-linux-gnu/usr/local/lib/opt/homebrew/lib,/opt/local/lib
Adding custom trusted directories:
For non-standard installations (Homebrew on Linux, luarocks, mise, asdf, etc.), register additional trusted directories:
# Programmatically at application startup
TreeHaver::PathValidator.add_trusted_directory("/home/linuxbrew/.linuxbrew/Cellar")
TreeHaver::PathValidator.add_trusted_directory("~/.local/share/mise/installs/lua")
# Or via environment variable (comma-separated, in your shell profile)
export TREE_HAVER_TRUSTED_DIRS = "/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"Example: Fedora Silverblue with Homebrew and luarocks
# In ~/.bashrc or ~/.zshrc
export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"
# tree-sitter runtime library
export TREE_SITTER_RUNTIME_LIB=/home/linuxbrew/.linuxbrew/Cellar/tree-sitter/0.26.3/lib/libtree-sitter.so
# Language grammar (luarocks-installed)
export TREE_SITTER_TOML_PATH=~/.local/share/mise/installs/lua/5.4.8/luarocks/lib/luarocks/rocks-5.4/tree-sitter-toml/0.0.31-1/parser/toml.so- Production: Consider using
find_library_path_safeto ignore ENV overrides - Development: Standard
find_library_pathis convenient for testing - User Input: Always validate paths before passing to
Language.from_library - CI/CD: Be cautious of ENV vars that could be set by untrusted sources
- Custom installs: Register trusted directories via
TREE_HAVER_TRUSTED_DIRSoradd_trusted_directory
TreeHaver automatically selects the best backend for your Ruby implementation, but you can override this behavior:
# Automatic backend selection (default)
TreeHaver.backend = :auto
# Force a specific backend
TreeHaver.backend = :mri # Use ruby_tree_sitter (MRI only, C extension)
TreeHaver.backend = :rust # Use tree_stump (MRI, Rust extension with precompiled binaries)
# Note: `tree_stump` currently requires unreleased fixes in the `main` branch.
# See: https://github.com/joker1007/tree_stump
TreeHaver.backend = :ffi # Use FFI bindings (works on MRI and JRuby)
TreeHaver.backend = :java # Use Java bindings (JRuby only, coming soon)
TreeHaver.backend = :citrus # Use Citrus pure Ruby parser
# NOTE: Portable, all Ruby implementations
# CAVEAT: few major language grammars, but many esoteric grammarsAuto-selection priority on MRI: MRI → Rust → FFI → Citrus
You can also set the backend via environment variable:
export TREE_HAVER_BACKEND=rustTreeHaver recognizes several environment variables for configuration:
Note: All path-based environment variables are validated before use. Invalid paths are ignored.
-
TREE_HAVER_TRUSTED_DIRS: Comma-separated list of additional trusted directories for grammar libraries# For Homebrew on Linux and luarocks export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"
Tilde (
~) is expanded to the user's home directory. Directories listed here are considered safe forfind_library_path_safe.
TREE_SITTER_RUNTIME_LIB: Absolute path to the corelibtree-sittershared libraryexport TREE_SITTER_RUNTIME_LIB=/usr/local/lib/libtree-sitter.so
If not set, TreeHaver tries these names in order:
tree-sitterlibtree-sitter.so.0libtree-sitter.solibtree-sitter.dyliblibtree-sitter.dll
When loading a language grammar, if you don't specify the symbol: parameter, TreeHaver resolves it in this precedence:
TREE_SITTER_LANG_SYMBOL: Explicit symbol override- Guessed from filename (e.g.,
libtree-sitter-toml.so→tree_sitter_toml) - Default fallback (
tree_sitter_toml)
export TREE_SITTER_LANG_SYMBOL=tree_sitter_tomlFor specific languages, you can set environment variables to point to grammar libraries:
export TREE_SITTER_TOML_PATH=/usr/local/lib/libtree-sitter-toml.so
export TREE_SITTER_JSON_PATH=/usr/local/lib/libtree-sitter-json.soFor the Java backend on JRuby:
export TREE_SITTER_JAVA_JARS_DIR=/path/to/java-tree-sitter/jarsFor more see docs, maven, and source.
Register languages once at application startup for convenient access:
# Register a TOML grammar
TreeHaver.register_language(
:toml,
path: "/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml", # optional, will be inferred if omitted
)
# Now you can use the convenient helper
language = TreeHaver::Language.toml
# Or still override path/symbol per-call
language = TreeHaver::Language.toml(
path: "/custom/path/libtree-sitter-toml.so",
)For libraries that need to automatically locate tree-sitter grammars (like the *-merge family of gems), TreeHaver provides the GrammarFinder utility class. It handles platform-aware grammar discovery without requiring language-specific code in TreeHaver itself.
# Create a finder for any language
finder = TreeHaver::GrammarFinder.new(:toml)
# Check if the grammar is available
if finder.available?
puts "TOML grammar found at: #{finder.find_library_path}"
else
puts finder.not_found_message
# => "tree-sitter toml grammar not found. Searched: /usr/lib/libtree-sitter-toml.so, ..."
end
# Register the language if available
finder.register! if finder.available?
# Now use the registered language
language = TreeHaver::Language.tomlGiven just the language name, GrammarFinder automatically derives:
| Property | Derived Value (for :toml) |
|---|---|
| ENV var | TREE_SITTER_TOML_PATH |
| Library filename | libtree-sitter-toml.so (Linux) or .dylib (macOS) |
| Symbol name | tree_sitter_toml |
GrammarFinder searches for grammars in this order:
- Environment variable:
TREE_SITTER_<LANG>_PATH(highest priority) - Extra paths: Custom paths provided at initialization
- System paths: Common installation directories (
/usr/lib,/usr/local/lib,/opt/homebrew/lib, etc.)
The GrammarFinder pattern enables clean integration in language-specific merge gems:
# In toml-merge
finder = TreeHaver::GrammarFinder.new(:toml)
finder.register! if finder.available?
# In json-merge
finder = TreeHaver::GrammarFinder.new(:json)
finder.register! if finder.available?
# In bash-merge
finder = TreeHaver::GrammarFinder.new(:bash)
finder.register! if finder.available?Each gem uses the same API—only the language name changes.
For non-standard installations, provide extra search paths:
finder = TreeHaver::GrammarFinder.new(:toml, extra_paths: [
"/opt/custom/lib",
"/home/user/.local/lib",
])Get detailed information about the grammar search:
finder = TreeHaver::GrammarFinder.new(:toml)
puts finder.search_info
# => {
# language: :toml,
# env_var: "TREE_SITTER_TOML_PATH",
# env_value: nil,
# symbol: "tree_sitter_toml",
# library_filename: "libtree-sitter-toml.so",
# search_paths: ["/usr/lib/libtree-sitter-toml.so", ...],
# found_path: "/usr/lib/libtree-sitter-toml.so",
# available: true
# }Different backends may support different features:
TreeHaver.capabilities
# => { backend: :mri, query: true, bytes_field: true }
# or
# => { backend: :ffi, parse: true, query: false, bytes_field: true }
# or
# => { backend: :citrus, parse: true, query: false, bytes_field: false }For codebases migrating from ruby_tree_sitter, TreeHaver provides a compatibility shim:
require "tree_haver/compat"
# Now TreeSitter constants map to TreeHaver
parser = TreeSitter::Parser.new # Actually creates TreeHaver::ParserThis is safe and idempotent—if the real TreeSitter module is already loaded, the shim does nothing.
Both ruby_tree_sitter v2+ and TreeHaver exceptions inherit from Exception (not StandardError).
This design decision follows ruby_tree_sitter's lead for thread-safety and signal handling reasons. See ruby_tree_sitter PR #83 for the rationale.
What this means for exception handling:
# ⚠️ This will NOT catch TreeHaver errors
begin
TreeHaver::Language.from_library("/nonexistent.so")
rescue => e
puts "Caught!" # Never reached - TreeHaver::Error inherits Exception
end
# ✅ Explicit rescue is required
begin
TreeHaver::Language.from_library("/nonexistent.so")
rescue TreeHaver::Error => e
puts "Caught!" # This works
end
# ✅ Or rescue specific exceptions
begin
TreeHaver::Language.from_library("/nonexistent.so")
rescue TreeHaver::NotAvailable => e
puts "Grammar not available: #{e.message}"
endTreeHaver Exception Hierarchy:
Exception
└── TreeHaver::Error # Base error class
├── TreeHaver::NotAvailable # Backend/grammar not available
└── TreeHaver::BackendConflict # Backend incompatibility detected
Compatibility Mode Behavior:
The compat mode (require "tree_haver/compat") creates aliases but does not change the exception hierarchy:
require "tree_haver/compat"
# TreeSitter constants are now aliases to TreeHaver
TreeSitter::Error # => TreeHaver::Error (still inherits Exception)
TreeSitter::Parser # => TreeHaver::Parser
TreeSitter::Language # => TreeHaver::Language
# Exception handling remains the same
begin
TreeSitter::Language.load("missing", "/nonexistent.so")
rescue TreeSitter::Error => e # Still requires explicit rescue
puts "Error: #{e.message}"
endBest Practices:
-
Always use explicit rescue for TreeHaver errors:
begin finder = TreeHaver::GrammarFinder.new(:toml) finder.register! if finder.available? language = TreeHaver::Language.toml rescue TreeHaver::NotAvailable => e warn("TOML grammar not available: #{e.message}") # Fallback to another backend or fail gracefully end
-
Never rely on
rescue => eto catch TreeHaver errors (it won't work)
Why inherit from Exception?
Following ruby_tree_sitter's reasoning:
- Thread safety: Prevents accidental catching in thread cleanup code
- Signal handling: Ensures parsing errors don't interfere with SIGTERM/SIGINT
- Intentional handling: Forces developers to explicitly handle parsing errors
See lib/tree_haver/compat.rb for compatibility layer documentation.
The simplest way to parse code is with TreeHaver.parser_for, which handles all the complexity of language loading, grammar discovery, and backend selection:
require "tree_haver"
# Parse TOML - auto-discovers grammar and falls back to Citrus if needed
parser = TreeHaver.parser_for(:toml)
tree = parser.parse("[package]\nname = \"my-app\"")
# Parse JSON
parser = TreeHaver.parser_for(:json)
tree = parser.parse('{"key": "value"}')
# Parse Bash
parser = TreeHaver.parser_for(:bash)
tree = parser.parse("#!/bin/bash\necho hello")
# With explicit library path
parser = TreeHaver.parser_for(:toml, library_path: "/custom/path/libtree-sitter-toml.so")
# With Citrus fallback configuration
parser = TreeHaver.parser_for(
:toml,
citrus_config: {gem_name: "toml-rb", grammar_const: "TomlRB::Document"},
)TreeHaver.parser_for handles:
- Checking if the language is already registered
- Auto-discovering tree-sitter grammar via
GrammarFinder - Falling back to Citrus grammar if tree-sitter is unavailable
- Creating and configuring the parser
- Raising
NotAvailablewith a helpful message if nothing works
For more control, you can create parsers manually:
TreeHaver works with any language through its 10 backends. Here are examples for different parsing needs:
require "tree_haver"
# Load a tree-sitter grammar (works with MRI, Rust, FFI, or Java backend)
language = TreeHaver::Language.from_library(
"/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml",
)
# Create a parser
parser = TreeHaver::Parser.new
parser.language = language
# Parse source code
source = <<~TOML
[package]
name = "my-app"
version = "1.0.0"
TOML
tree = parser.parse(source)
# Access the unified Position API (works across all backends)
root = tree.root_node
puts "Root type: #{root.type}" # => "document"
puts "Start line: #{root.start_line}" # => 1 (1-based)
puts "End line: #{root.end_line}" # => 3
puts "Position: #{root.source_position}" # => {start_line: 1, end_line: 3, ...}
# Traverse the tree
root.each do |child|
puts "Child: #{child.type} at line #{child.start_line}"
endrequire "tree_haver"
TreeHaver.backend = :prism
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Prism::Language.ruby
source = <<~RUBY
class Example
def hello
puts "Hello, world!"
end
end
RUBY
tree = parser.parse(source)
root = tree.root_node
# Find all method definitions
def find_methods(node, results = [])
results << node if node.type == "def_node"
node.children.each { |child| find_methods(child, results) }
results
end
methods = find_methods(root)
methods.each do |method_node|
pos = method_node.source_position
puts "Method at lines #{pos[:start_line]}-#{pos[:end_line]}"
endrequire "tree_haver"
TreeHaver.backend = :psych
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Psych::Language.yaml
source = <<~YAML
database:
host: localhost
port: 5432
YAML
tree = parser.parse(source)
root = tree.root_node
# Navigate YAML structure
def show_structure(node, indent = 0)
prefix = " " * indent
puts "#{prefix}#{node.type} (line #{node.start_line})"
node.children.each { |child| show_structure(child, indent + 1) }
end
show_structure(root)require "tree_haver"
# Choose your backend
TreeHaver.backend = :commonmarker # or :markly for GFM
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Commonmarker::Language.markdown
source = <<~MARKDOWN
# My Document
## Section
- Item 1
- Item 2
MARKDOWN
tree = parser.parse(source)
root = tree.root_node
# Find all headings
def find_headings(node, results = [])
results << node if node.type == "heading"
node.children.each { |child| find_headings(child, results) }
results
end
headings = find_headings(root)
headings.each do |heading|
level = heading.header_level
text = heading.children.map(&:text).join
puts "H#{level}: #{text} (line #{heading.start_line})"
endFor cleaner code, register languages at startup:
# At application initialization
TreeHaver.register_language(
:toml,
path: "/usr/local/lib/libtree-sitter-toml.so",
)
TreeHaver.register_language(
:json,
path: "/usr/local/lib/libtree-sitter-json.so",
)
# Later in your code
toml_language = TreeHaver::Language.toml
json_language = TreeHaver::Language.json
parser = TreeHaver::Parser.new
parser.language = toml_language
tree = parser.parse(toml_source)The name parameter in register_language is an arbitrary identifier you choose—it doesn't
need to match the actual language name. The actual grammar identity comes from the path
and symbol parameters (for tree-sitter) or grammar_module (for Citrus).
This flexibility is useful for:
- Aliasing: Register the same grammar under multiple names
- Versioning: Register different grammar versions (e.g.,
:ruby_2,:ruby_3) - Testing: Use unique names to avoid collisions between tests
- Context-specific naming: Use names that make sense for your application
# Register the same TOML grammar under different names for different purposes
TreeHaver.register_language(
:config_parser, # Custom name for your app
path: "/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml",
)
TreeHaver.register_language(
:toml_v1, # Version-specific name
path: "/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml",
)
# Use your custom names
config_lang = TreeHaver::Language.config_parser
versioned_lang = TreeHaver::Language.toml_v1TreeHaver works with any tree-sitter grammar:
# Parse Ruby code
ruby_lang = TreeHaver::Language.from_library(
"/path/to/libtree-sitter-ruby.so",
)
parser = TreeHaver::Parser.new
parser.language = ruby_lang
tree = parser.parse("class Foo; end")
# Parse JavaScript
js_lang = TreeHaver::Language.from_library(
"/path/to/libtree-sitter-javascript.so",
)
parser.language = js_lang # Reuse the same parser
tree = parser.parse("const x = 42;")TreeHaver provides simple node traversal:
tree = parser.parse(source)
root = tree.root_node
# Recursive tree walk
def walk_tree(node, depth = 0)
puts "#{" " * depth}#{node.type}"
node.each { |child| walk_tree(child, depth + 1) }
end
walk_tree(root)TreeHaver supports incremental parsing when using the MRI or Rust backends. This is a major performance optimization for editors and IDEs that need to re-parse on every keystroke.
# Check if current backend supports incremental parsing
if TreeHaver.capabilities[:incremental]
puts "Incremental parsing is available!"
end
# Initial parse
parser = TreeHaver::Parser.new
parser.language = language
tree = parser.parse_string(nil, "x = 1")
# User edits the source: "x = 1" -> "x = 42"
# Mark the tree as edited (tell tree-sitter what changed)
tree.edit(
start_byte: 4, # edit starts at byte 4
old_end_byte: 5, # old text "1" ended at byte 5
new_end_byte: 6, # new text "42" ends at byte 6
start_point: {row: 0, column: 4},
old_end_point: {row: 0, column: 5},
new_end_point: {row: 0, column: 6},
)
# Re-parse incrementally - tree-sitter reuses unchanged nodes
new_tree = parser.parse_string(tree, "x = 42")Note: Incremental parsing requires the MRI (ruby_tree_sitter), Rust (tree_stump), or Java (java-tree-sitter / jtreesitter) backend. The FFI and Citrus backends do not currently support incremental parsing. You can check support with:
Note: tree_stump currently requires unreleased fixes in the main branch.
tree.supports_editing? # => true if edit() is availablebegin
language = TreeHaver::Language.from_library("/path/to/grammar.so")
rescue TreeHaver::NotAvailable => e
puts "Failed to load grammar: #{e.message}"
end
# Check if a backend is available
if TreeHaver.backend_module.nil?
puts "No TreeHaver backend is available!"
puts "Install ruby_tree_sitter (MRI), ffi gem with libtree-sitter, or citrus gem"
endOn MRI, TreeHaver uses ruby_tree_sitter by default:
# Gemfile
gem "tree_haver"
gem "ruby_tree_sitter" # MRI backend
# Code - no changes needed, TreeHaver auto-selects MRI backend
parser = TreeHaver::Parser.newOn JRuby, TreeHaver can use the FFI backend, Java backend, or Citrus backend:
Option 1: FFI Backend (recommended for tree-sitter grammars)
# Gemfile
gem "tree_haver"
gem "ffi" # Required for FFI backend
# Ensure libtree-sitter is installed on your system
# On macOS with Homebrew:
# brew install tree-sitter
# On Ubuntu/Debian:
# sudo apt-get install libtree-sitter0 libtree-sitter-dev
# Code - TreeHaver auto-selects FFI backend on JRuby
parser = TreeHaver::Parser.newOption 2: Java Backend (native JVM performance)
# 1. Download java-tree-sitter JAR from Maven Central
mkdir -p vendor/jars
curl -fSL -o vendor/jars/jtreesitter-0.23.2.jar \
"https://repo1.maven.org/maven2/io/github/tree-sitter/jtreesitter/0.23.2/jtreesitter-0.23.2.jar"
# 2. Set environment variables
export CLASSPATH="$(pwd)/vendor/jars:$CLASSPATH"
export LD_LIBRARY_PATH="/path/to/libtree-sitter/lib:$LD_LIBRARY_PATH"
# 3. Run with JRuby (requires Java 22+ for Foreign Function API)
JAVA_OPTS="--enable-native-access=ALL-UNNAMED" jruby your_script.rb# Force Java backend
TreeHaver.backend = :java
# Check if Java backend is available
if TreeHaver::Backends::Java.available?
puts "Java backend is ready!"
puts TreeHaver.capabilities
# => { backend: :java, parse: true, query: true, bytes_field: true, incremental: true }
endThe Java backend uses Java's Foreign Function & Memory (FFM) API which loads libraries in isolation. Unlike the system's dynamic linker (dlopen), FFM's SymbolLookup.or() chains symbol lookups but doesn't resolve dynamic library dependencies.
This means grammar .so files with unresolved references to libtree-sitter.so symbols won't load correctly. Most grammars from luarocks, npm, or other sources have these dependencies.
Recommended approach for JRuby: Use the FFI backend:
# On JRuby, use FFI backend (recommended)
TreeHaver.backend = :ffiThe FFI backend uses Ruby's FFI gem which relies on the system's dynamic linker, correctly resolving symbol dependencies between libtree-sitter.so and grammar libraries.
The Java backend will work with:
- Grammar JARs built specifically for java-tree-sitter / jtreesitter (self-contained, docs, maven, source)
- Grammar
.sofiles that statically link tree-sitter
Option 3: Citrus Backend (pure Ruby, portable)
# Gemfile
gem "tree_haver"
gem "citrus" # Pure Ruby parser, zero native dependencies
# Code - Force Citrus backend for maximum portability
TreeHaver.backend = :citrus
# Check if Citrus backend is available
if TreeHaver::Backends::Citrus.available?
puts "Citrus backend is ready!"
puts TreeHaver.capabilities
# => { backend: :citrus, parse: true, query: false, bytes_field: false }
end- Uses Citrus grammars (not tree-sitter grammars)
- No incremental parsing support
- No query API
- Pure Ruby performance (slower than native backends)
- Best for: prototyping, environments without native extension support, teaching
TruffleRuby can use the MRI, FFI, or Citrus backend:
# Use FFI backend (recommended for tree-sitter grammars)
TreeHaver.backend = :ffi
# Or try MRI backend if ruby_tree_sitter compiles on your TruffleRuby version
TreeHaver.backend = :mri
# Or use Citrus backend for zero native dependencies
TreeHaver.backend = :citrusTreeHaver provides with_backend for thread-safe, temporary backend switching. This is
essential for testing, benchmarking, and applications that need different backends in
different contexts.
Test the same code path with different backends using with_backend:
# In your test setup
RSpec.describe("MyParser") do
# Test with each available backend
[:mri, :rust, :citrus].each do |backend_name|
context "with #{backend_name} backend" do
it "parses correctly" do
TreeHaver.with_backend(backend_name) do
parser = TreeHaver::Parser.new
result = parser.parse("x = 42")
expect(result.root_node.type).to(eq("document"))
end
# Backend automatically restored after block
end
end
end
endEach thread can use a different backend safely—with_backend uses thread-local storage:
threads = []
threads << Thread.new do
TreeHaver.with_backend(:mri) do
# This thread uses MRI backend
parser = TreeHaver::Parser.new
100.times { parser.parse("x = 1") }
end
end
threads << Thread.new do
TreeHaver.with_backend(:citrus) do
# This thread uses Citrus backend simultaneously
parser = TreeHaver::Parser.new
100.times { parser.parse("x = 1") }
end
end
threads.each(&:join)with_backend supports nesting—inner blocks override outer blocks:
TreeHaver.with_backend(:rust) do
puts TreeHaver.effective_backend # => :rust
TreeHaver.with_backend(:citrus) do
puts TreeHaver.effective_backend # => :citrus
end
puts TreeHaver.effective_backend # => :rust (restored)
endTry one backend, fall back to another on failure:
def parse_with_fallback(source)
TreeHaver.with_backend(:mri) do
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
end
rescue TreeHaver::NotAvailable
# Fall back to Citrus if MRI backend unavailable
TreeHaver.with_backend(:citrus) do
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
end
endHere's a practical example that extracts package names from a TOML file:
require "tree_haver"
# Setup
TreeHaver.register_language(
:toml,
path: "/usr/local/lib/libtree-sitter-toml.so",
)
def extract_package_name(toml_content)
# Create parser
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.toml
# Parse
tree = parser.parse(toml_content)
root = tree.root_node
# Find [package] table
root.each do |child|
next unless child.type == "table"
child.each do |table_elem|
if table_elem.type == "pair"
# Look for name = "..." pair
key = table_elem.each.first&.type
# In a real implementation, you'd extract the text value
# This is simplified for demonstration
end
end
end
end
# Usage
toml = <<~TOML
[package]
name = "awesome-app"
version = "2.0.0"
TOML
package_name = extract_package_name(toml)While kettle-rb tools are free software and will always be, the project would benefit immensely from some funding. Raising a monthly budget of... "dollars" would make the project more sustainable.
We welcome both individual and corporate sponsors! We also offer a wide array of funding channels to account for your preferences (although currently Open Collective is our preferred funding platform).
If you're working in a company that's making significant use of kettle-rb tools we'd appreciate it if you suggest to your company to become a kettle-rb sponsor.
You can support the development of kettle-rb tools via GitHub Sponsors, Liberapay, PayPal, Open Collective and Tidelift.
| 📍 NOTE |
|---|
| If doing a sponsorship in the form of donation is problematic for your company from an accounting standpoint, we'd recommend the use of Tidelift, where you can get a support-like subscription instead. |
Support us with a monthly donation and help us continue our activities. [Become a backer]
NOTE: kettle-readme-backers updates this list every day, automatically.
No backers yet. Be the first!
Become a sponsor and get your logo on our README on GitHub with a link to your site. [Become a sponsor]
NOTE: kettle-readme-backers updates this list every day, automatically.
No sponsors yet. Be the first!
I’m driven by a passion to foster a thriving open-source community – a space where people can tackle complex problems, no matter how small. Revitalizing libraries that have fallen into disrepair, and building new libraries focused on solving real-world challenges, are my passions. I was recently affected by layoffs, and the tech jobs market is unwelcoming. I’m reaching out here because your support would significantly aid my efforts to provide for my family, and my farm (11 🐔 chickens, 2 🐶 dogs, 3 🐰 rabbits, 8 🐈 cats).
If you work at a company that uses my work, please encourage them to support me as a corporate sponsor. My work on gems you use might show up in bundle fund.
I’m developing a new library, floss_funding, designed to empower open-source developers like myself to get paid for the work we do, in a sustainable way. Please give it a look.
Floss-Funding.dev: 👉️ No network calls. 👉️ No tracking. 👉️ No oversight. 👉️ Minimal crypto hashing. 💡 Easily disabled nags
See SECURITY.md.
If you need some ideas of where to help, you could work on adding more code coverage, or if it is already 💯 (see below) check reek, issues, or PRs, or use the gem and think about how it could be better.
We so if you make changes, remember to update it.
See CONTRIBUTING.md for more detailed instructions.
See CONTRIBUTING.md.
Everyone interacting with this project's codebases, issue trackers,
chat rooms and mailing lists agrees to follow the .
Made with contributors-img.
Also see GitLab Contributors: https://gitlab.com/kettle-rb/tree_haver/-/graphs/main
This Library adheres to .
Violations of this scheme should be reported as bugs.
Specifically, if a minor or patch version is released that breaks backward compatibility,
a new version should be immediately released that restores compatibility.
Breaking changes to the public API will only be introduced with new major versions.
dropping support for a platform is both obviously and objectively a breaking change
—Jordan Harband (@ljharb, maintainer of SemVer) in SemVer issue 716
I understand that policy doesn't work universally ("exceptions to every rule!"), but it is the policy here. As such, in many cases it is good to specify a dependency on this library using the Pessimistic Version Constraint with two digits of precision.
For example:
spec.add_dependency("tree_haver", "~> 1.0")📌 Is "Platform Support" part of the public API? More details inside.
SemVer should, IMO, but doesn't explicitly, say that dropping support for specific Platforms is a breaking change to an API, and for that reason the bike shedding is endless.
To get a better understanding of how SemVer is intended to work over a project's lifetime, read this article from the creator of SemVer:
See CHANGELOG.md for a list of releases.
The gem is available as open source under the terms of
the MIT License .
See LICENSE.txt for the official Copyright Notice.
-
Copyright (c) 2025 Peter H. Boling, of
Galtzo.com
, and tree_haver contributors.
Maintainers have teeth and need to pay their dentists. After getting laid off in an RIF in March, and encountering difficulty finding a new one, I began spending most of my time building open source tools. I'm hoping to be able to pay for my kids' health insurance this month, so if you value the work I am doing, I need your support. Please consider sponsoring me or the project.
To join the community or get help 👇️ Join the Discord.
To say "thanks!" ☝️ Join the Discord or 👇️ send money.
Thanks for RTFM.