Incomplete Rust Bindings for llama.cpp Tool Calling Support

llama.cpp has comprehensive native tool calling support (merged in PR #9639, January 2025), but the Rust bindings in llama-cpp-2
  v0.1.125 don't expose the necessary APIs to use it. When attempting to implement tool calling manually, we encounter crashes due to
  missing or incomplete grammar sampler bindings.

##  Background: What We're Trying to Achieve

  We're building an AI agent with tool calling capabilities (function calling) where the LLM can:
  1. Receive tool definitions (functions it can call)
  2. Decide when to call tools vs. respond naturally
  3. Generate structured JSON output for tool invocations
  4. Have output validated via grammar constraints

##  What llama.cpp Provides (C++ Level)

  llama.cpp includes comprehensive tool calling infrastructure (PR #9639):

  1. JSON Schema to Grammar Conversion

  // From common/json-schema-to-grammar.h
  std::string json_schema_to_grammar(const nlohmann::ordered_json & schema);

  2. Chat Templates with Tools

```cpp
  // Templates accept tool definitions and format them into prompts
  template.apply({
      .messages = messages,
      .tools = tool_definitions,        // ← Not exposed in Rust
      .add_generation_prompt = true
  });

  // Returns: { prompt: "...", grammar: "..." }  // ← Grammar auto-generated!
```

  3. Grammar Builder Utilities

```cpp
  struct common_grammar_builder {
      std::string add_schema(const std::string& name, const nlohmann::ordered_json& schema);
  };
```

  4. Tool-Aware Template System

  The minja template engine formats tools into model-specific prompts and generates matching GBNF grammars automatically.

  What's Missing in llama-cpp-2 Rust Bindings

  ❌ No json_schema_to_grammar Binding

```rust
  // DOES NOT EXIST in llama-cpp-2
  pub fn json_schema_to_grammar(schema: &serde_json::Value) -> String;
```

Impact: Must manually write GBNF grammars for tool schemas.

  ❌ No Tools Parameter in Chat Templates

```rust
  // Current API (simplified)
  pub fn apply_chat_template(
      &self,
      tmpl: &LlamaChatTemplate,
      chat: &[LlamaChatMessage],
      add_ass: bool
  ) -> Result<String>;  // ← Only returns prompt, no grammar!

  // Missing: tools parameter and grammar output
```

Impact: Can't pass tool definitions to templates, no auto-generated grammar.

  ❌ No Grammar Builder API

  No Rust equivalent of common_grammar_builder for constructing GBNF from JSON schemas.

  ❌ Grammar Samplers Crash (Secondary Issue)

  When we tried to work around missing bindings by manually writing grammars, we discovered the grammar samplers themselves crash.

##  Our Workaround Attempt (Failed)

  Since the proper APIs are missing, we attempted to:

  1. ✅ Manual Tool Formatting: Inject tool definitions into system message (works)
  2. ✅ Chat Template Usage: Use apply_chat_template() for prompt formatting (works)
  3. ❌ Manual GBNF Grammar: Write tool call JSON grammar by hand (crashes)
  4. ❌ Grammar Sampling: Use LlamaSampler::grammar() to enforce structure (crashes)

  Manual Grammar Approach

```rust
  // We tried writing the grammar manually
  const TOOL_CALL_GRAMMAR: &str = r#"
  root ::= "{" ws "\"tool_calls\"" ws ":" ws "[" ws tool-call ws "]" ws "}"
  tool-call ::= "{" ws "\"id\"" ws ":" ws string ws "," ws "\"function\"" ws ":" ws function ws "}"
  // ... etc
  "#;

  let sampler = LlamaSampler::grammar(&model, TOOL_CALL_GRAMMAR, "root")?;
  // CRASHES with: GGML_ASSERT(cur_p.selected >= 0) failed

  Even using the official json.gbnf included in llama-cpp-2 crashes:
  // From llama-cpp-2/src/grammar/json.gbnf
  const JSON_GRAMMAR: &str = include_str!("json.gbnf");
  let sampler = LlamaSampler::grammar(&model, JSON_GRAMMAR, "root")?;
  // CRASHES with: Unexpected empty grammar stack
```

##  What We Need: Proper Tool Calling Bindings

###  Option 1: Expose Existing C++ Functions (Ideal)

  Add bindings to the C++ common library:

```rust
  pub fn json_schema_to_grammar(schema: &serde_json::Value) -> Result<String>;

  pub struct ChatTemplateResult {
      pub prompt: String,
      pub grammar: Option<String>,  // Auto-generated!
  }

  pub fn apply_chat_template_with_tools(
      &self,
      tmpl: &LlamaChatTemplate,
      messages: &[LlamaChatMessage],
      tools: Option<&[ToolDefinition]>,
      add_generation_prompt: bool
  ) -> Result<ChatTemplateResult>;
```

###  Option 2: Reimplement in Rust (Alternative)

  Port the JSON schema → GBNF converter to pure Rust (~500 LOC from json-schema-to-grammar.cpp).

###  Option 3: Fix Grammar Samplers (Minimum)

  At minimum, make grammar samplers actually work so manual grammar writing is viable.

  Use Case: Our Tool Calling Implementation

```rust
  // What we want to do:
  let tools = vec![
      ToolDefinition {
          name: "execute_python",
          description: "Execute Python code",
          parameters: json!({
              "type": "object",
              "properties": {
                  "code": {"type": "string"}
              }
          })
      }
  ];

  // With proper bindings, this would work:
  let result = model.apply_chat_template_with_tools(
      &template,
      &messages,
      Some(&tools),  // ← Pass tools
      true
  )?;

  // Returns:
  // - result.prompt = formatted prompt with tool descriptions
  // - result.grammar = auto-generated GBNF for tool call JSON

  let sampler = LlamaSampler::grammar(&model, &result.grammar, "root")?;
  // Should enforce valid tool call structure
```

##  Current Workarounds and Limitations

###  Workaround 1: Use llama-server via HTTP ✅

  - Run llama-server as subprocess
  - Use /v1/chat/completions endpoint
  - Full PR #9639 support with tools parameter
  - Downside: HTTP overhead, process management complexity

###  Workaround 2: Use Alternative Backends ✅

  - Switch to mistral.rs (works perfectly)
  - Or use other inference engines
  - Downside: Defeats purpose of llama.cpp integration

###  Workaround 3: Prompt Engineering Without Grammar ⚠

  - Rely on model behavior without constraints
  - Hope for valid JSON output
  - Downside: Unreliable, may generate malformed JSON

##  Why This Matters

  Tool calling is a fundamental LLM capability for:
  - AI agents (our use case)
  - Function calling applications
  - Structured data extraction
  - Agentic workflows

  llama.cpp has excellent tool calling support, but it's inaccessible from Rust due to missing bindings.

##  Environment

  - llama-cpp-2: v0.1.125
  - llama.cpp commit: d7395115baf395b75a73a17b0b796e746e468da9 (Oct 30, 2025)
    - Includes PR #9639 (tool calling, merged Jan 30, 2025)
    - Does NOT include PR #16932 (XML tool calling, merged Nov 18, 2025)
  - OS: Linux 6.17.7
  - Backend: Vulkan
  - Model: Qwen3-4B-Q8_0.gguf

##  Questions

  1. Are there plans to expose tool calling APIs?
    - json_schema_to_grammar binding
    - Tools parameter in chat templates
    - Grammar result from template application
  2. Is there a workaround we're missing?
    - Are these functions accessible another way?
    - Is there an undocumented API?
  3. Would you accept a PR?
    - We're willing to contribute bindings
    - Need guidance on approach (FFI to C++ or pure Rust port)
  4. Are grammar samplers supposed to work?
    - They crash even with official grammars
    - Is this a known issue or are we misusing the API?

##  Desired API (Proposal)

```rust
  // Tool definition structure
  #[derive(Serialize)]
  pub struct ToolDefinition {
      pub r#type: String,  // "function"
      pub function: FunctionDefinition,
  }

  #[derive(Serialize)]
  pub struct FunctionDefinition {
      pub name: String,
      pub description: String,
      pub parameters: serde_json::Value,  // JSON schema
  }

  // Extended chat template API
  pub struct ChatTemplateOptions {
      pub messages: Vec<LlamaChatMessage>,
      pub tools: Option<Vec<ToolDefinition>>,
      pub add_generation_prompt: bool,
  }

  pub struct ChatTemplateResult {
      pub prompt: String,
      pub grammar: Option<String>,
  }

  impl LlamaModel {
      pub fn apply_chat_template_extended(
          &self,
          tmpl: &LlamaChatTemplate,
          options: ChatTemplateOptions,
      ) -> Result<ChatTemplateResult>;
  }

  // Grammar generation from JSON schema
  pub fn json_schema_to_gbnf(schema: &serde_json::Value) -> Result<String>;
```

##  Minimal Example of What We're Trying to Do

```rust
  // Define tool
  let tool = json!({
      "type": "function",
      "function": {
          "name": "get_weather",
          "description": "Get weather for a location",
          "parameters": {
              "type": "object",
              "properties": {
                  "location": {"type": "string"}
              },
              "required": ["location"]
          }
      }
  });

  // We want: Auto-generate grammar from schema
  let grammar = json_schema_to_gbnf(&tool["function"]["parameters"])?;

  // We want: Templates that accept tools
  let result = model.apply_chat_template_with_tools(&tmpl, &messages, &[tool], true)?;

  // Result should include both prompt and grammar
  println!("Prompt: {}", result.prompt);
  println!("Grammar: {}", result.grammar.unwrap());
```

##  Impact

  Without proper bindings:
  - ❌ Can't implement reliable tool calling in Rust
  - ❌ Must use workarounds (llama-server HTTP, other backends)
  - ❌ Can't leverage llama.cpp's excellent tool support
  - ❌ Rust ecosystem falls behind Python/C++ for agent development

##  Request

  Could you please:
  1. Confirm if tool calling support is planned for Rust bindings
  2. Provide guidance on the best approach to add it
  3. Review our proposed API design
  4. Accept contributions if we implement this ourselves

  We're happy to contribute code, write tests, and help maintain these features.

  ---
  Related: The grammar sampler crashes are a secondary issue - we only discovered them because we tried to work around the missing tool
  calling APIs. The primary ask is for proper tool calling bindings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incomplete Rust Bindings for llama.cpp Tool Calling Support #864

Background: What We're Trying to Achieve

What llama.cpp Provides (C++ Level)

Our Workaround Attempt (Failed)

What We Need: Proper Tool Calling Bindings

Option 1: Expose Existing C++ Functions (Ideal)

Option 2: Reimplement in Rust (Alternative)

Option 3: Fix Grammar Samplers (Minimum)

Current Workarounds and Limitations

Workaround 1: Use llama-server via HTTP ✅

Workaround 2: Use Alternative Backends ✅

Workaround 3: Prompt Engineering Without Grammar ⚠

Why This Matters

Environment

Questions

Desired API (Proposal)

Minimal Example of What We're Trying to Do

Impact

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incomplete Rust Bindings for llama.cpp Tool Calling Support #864

Description

Background: What We're Trying to Achieve

What llama.cpp Provides (C++ Level)

Our Workaround Attempt (Failed)

What We Need: Proper Tool Calling Bindings

Option 1: Expose Existing C++ Functions (Ideal)

Option 2: Reimplement in Rust (Alternative)

Option 3: Fix Grammar Samplers (Minimum)

Current Workarounds and Limitations

Workaround 1: Use llama-server via HTTP ✅

Workaround 2: Use Alternative Backends ✅

Workaround 3: Prompt Engineering Without Grammar ⚠

Why This Matters

Environment

Questions

Desired API (Proposal)

Minimal Example of What We're Trying to Do

Impact

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions