Skip to content

Incomplete Rust Bindings for llama.cpp Tool Calling Support #864

@JMLX42

Description

@JMLX42

llama.cpp has comprehensive native tool calling support (merged in PR #9639, January 2025), but the Rust bindings in llama-cpp-2
v0.1.125 don't expose the necessary APIs to use it. When attempting to implement tool calling manually, we encounter crashes due to
missing or incomplete grammar sampler bindings.

Background: What We're Trying to Achieve

We're building an AI agent with tool calling capabilities (function calling) where the LLM can:

  1. Receive tool definitions (functions it can call)
  2. Decide when to call tools vs. respond naturally
  3. Generate structured JSON output for tool invocations
  4. Have output validated via grammar constraints

What llama.cpp Provides (C++ Level)

llama.cpp includes comprehensive tool calling infrastructure (PR #9639):

  1. JSON Schema to Grammar Conversion

// From common/json-schema-to-grammar.h
std::string json_schema_to_grammar(const nlohmann::ordered_json & schema);

  1. Chat Templates with Tools
  // Templates accept tool definitions and format them into prompts
  template.apply({
      .messages = messages,
      .tools = tool_definitions,        // ← Not exposed in Rust
      .add_generation_prompt = true
  });

  // Returns: { prompt: "...", grammar: "..." }  // ← Grammar auto-generated!
  1. Grammar Builder Utilities
  struct common_grammar_builder {
      std::string add_schema(const std::string& name, const nlohmann::ordered_json& schema);
  };
  1. Tool-Aware Template System

The minja template engine formats tools into model-specific prompts and generates matching GBNF grammars automatically.

What's Missing in llama-cpp-2 Rust Bindings

❌ No json_schema_to_grammar Binding

  // DOES NOT EXIST in llama-cpp-2
  pub fn json_schema_to_grammar(schema: &serde_json::Value) -> String;

Impact: Must manually write GBNF grammars for tool schemas.

❌ No Tools Parameter in Chat Templates

  // Current API (simplified)
  pub fn apply_chat_template(
      &self,
      tmpl: &LlamaChatTemplate,
      chat: &[LlamaChatMessage],
      add_ass: bool
  ) -> Result<String>;  // ← Only returns prompt, no grammar!

  // Missing: tools parameter and grammar output

Impact: Can't pass tool definitions to templates, no auto-generated grammar.

❌ No Grammar Builder API

No Rust equivalent of common_grammar_builder for constructing GBNF from JSON schemas.

❌ Grammar Samplers Crash (Secondary Issue)

When we tried to work around missing bindings by manually writing grammars, we discovered the grammar samplers themselves crash.

Our Workaround Attempt (Failed)

Since the proper APIs are missing, we attempted to:

  1. ✅ Manual Tool Formatting: Inject tool definitions into system message (works)
  2. ✅ Chat Template Usage: Use apply_chat_template() for prompt formatting (works)
  3. ❌ Manual GBNF Grammar: Write tool call JSON grammar by hand (crashes)
  4. ❌ Grammar Sampling: Use LlamaSampler::grammar() to enforce structure (crashes)

Manual Grammar Approach

  // We tried writing the grammar manually
  const TOOL_CALL_GRAMMAR: &str = r#"
  root ::= "{" ws "\"tool_calls\"" ws ":" ws "[" ws tool-call ws "]" ws "}"
  tool-call ::= "{" ws "\"id\"" ws ":" ws string ws "," ws "\"function\"" ws ":" ws function ws "}"
  // ... etc
  "#;

  let sampler = LlamaSampler::grammar(&model, TOOL_CALL_GRAMMAR, "root")?;
  // CRASHES with: GGML_ASSERT(cur_p.selected >= 0) failed

  Even using the official json.gbnf included in llama-cpp-2 crashes:
  // From llama-cpp-2/src/grammar/json.gbnf
  const JSON_GRAMMAR: &str = include_str!("json.gbnf");
  let sampler = LlamaSampler::grammar(&model, JSON_GRAMMAR, "root")?;
  // CRASHES with: Unexpected empty grammar stack

What We Need: Proper Tool Calling Bindings

Option 1: Expose Existing C++ Functions (Ideal)

Add bindings to the C++ common library:

  pub fn json_schema_to_grammar(schema: &serde_json::Value) -> Result<String>;

  pub struct ChatTemplateResult {
      pub prompt: String,
      pub grammar: Option<String>,  // Auto-generated!
  }

  pub fn apply_chat_template_with_tools(
      &self,
      tmpl: &LlamaChatTemplate,
      messages: &[LlamaChatMessage],
      tools: Option<&[ToolDefinition]>,
      add_generation_prompt: bool
  ) -> Result<ChatTemplateResult>;

Option 2: Reimplement in Rust (Alternative)

Port the JSON schema → GBNF converter to pure Rust (~500 LOC from json-schema-to-grammar.cpp).

Option 3: Fix Grammar Samplers (Minimum)

At minimum, make grammar samplers actually work so manual grammar writing is viable.

Use Case: Our Tool Calling Implementation

  // What we want to do:
  let tools = vec![
      ToolDefinition {
          name: "execute_python",
          description: "Execute Python code",
          parameters: json!({
              "type": "object",
              "properties": {
                  "code": {"type": "string"}
              }
          })
      }
  ];

  // With proper bindings, this would work:
  let result = model.apply_chat_template_with_tools(
      &template,
      &messages,
      Some(&tools),  // ← Pass tools
      true
  )?;

  // Returns:
  // - result.prompt = formatted prompt with tool descriptions
  // - result.grammar = auto-generated GBNF for tool call JSON

  let sampler = LlamaSampler::grammar(&model, &result.grammar, "root")?;
  // Should enforce valid tool call structure

Current Workarounds and Limitations

Workaround 1: Use llama-server via HTTP ✅

  • Run llama-server as subprocess
  • Use /v1/chat/completions endpoint
  • Full PR #9639 support with tools parameter
  • Downside: HTTP overhead, process management complexity

Workaround 2: Use Alternative Backends ✅

  • Switch to mistral.rs (works perfectly)
  • Or use other inference engines
  • Downside: Defeats purpose of llama.cpp integration

Workaround 3: Prompt Engineering Without Grammar ⚠

  • Rely on model behavior without constraints
  • Hope for valid JSON output
  • Downside: Unreliable, may generate malformed JSON

Why This Matters

Tool calling is a fundamental LLM capability for:

  • AI agents (our use case)
  • Function calling applications
  • Structured data extraction
  • Agentic workflows

llama.cpp has excellent tool calling support, but it's inaccessible from Rust due to missing bindings.

Environment

  • llama-cpp-2: v0.1.125
  • llama.cpp commit: d7395115baf395b75a73a17b0b796e746e468da9 (Oct 30, 2025)
    • Includes PR #9639 (tool calling, merged Jan 30, 2025)
    • Does NOT include PR #16932 (XML tool calling, merged Nov 18, 2025)
  • OS: Linux 6.17.7
  • Backend: Vulkan
  • Model: Qwen3-4B-Q8_0.gguf

Questions

  1. Are there plans to expose tool calling APIs?
    - json_schema_to_grammar binding
    - Tools parameter in chat templates
    - Grammar result from template application
  2. Is there a workaround we're missing?
    - Are these functions accessible another way?
    - Is there an undocumented API?
  3. Would you accept a PR?
    - We're willing to contribute bindings
    - Need guidance on approach (FFI to C++ or pure Rust port)
  4. Are grammar samplers supposed to work?
    - They crash even with official grammars
    - Is this a known issue or are we misusing the API?

Desired API (Proposal)

  // Tool definition structure
  #[derive(Serialize)]
  pub struct ToolDefinition {
      pub r#type: String,  // "function"
      pub function: FunctionDefinition,
  }

  #[derive(Serialize)]
  pub struct FunctionDefinition {
      pub name: String,
      pub description: String,
      pub parameters: serde_json::Value,  // JSON schema
  }

  // Extended chat template API
  pub struct ChatTemplateOptions {
      pub messages: Vec<LlamaChatMessage>,
      pub tools: Option<Vec<ToolDefinition>>,
      pub add_generation_prompt: bool,
  }

  pub struct ChatTemplateResult {
      pub prompt: String,
      pub grammar: Option<String>,
  }

  impl LlamaModel {
      pub fn apply_chat_template_extended(
          &self,
          tmpl: &LlamaChatTemplate,
          options: ChatTemplateOptions,
      ) -> Result<ChatTemplateResult>;
  }

  // Grammar generation from JSON schema
  pub fn json_schema_to_gbnf(schema: &serde_json::Value) -> Result<String>;

Minimal Example of What We're Trying to Do

  // Define tool
  let tool = json!({
      "type": "function",
      "function": {
          "name": "get_weather",
          "description": "Get weather for a location",
          "parameters": {
              "type": "object",
              "properties": {
                  "location": {"type": "string"}
              },
              "required": ["location"]
          }
      }
  });

  // We want: Auto-generate grammar from schema
  let grammar = json_schema_to_gbnf(&tool["function"]["parameters"])?;

  // We want: Templates that accept tools
  let result = model.apply_chat_template_with_tools(&tmpl, &messages, &[tool], true)?;

  // Result should include both prompt and grammar
  println!("Prompt: {}", result.prompt);
  println!("Grammar: {}", result.grammar.unwrap());

Impact

Without proper bindings:

  • ❌ Can't implement reliable tool calling in Rust
  • ❌ Must use workarounds (llama-server HTTP, other backends)
  • ❌ Can't leverage llama.cpp's excellent tool support
  • ❌ Rust ecosystem falls behind Python/C++ for agent development

Request

Could you please:

  1. Confirm if tool calling support is planned for Rust bindings
  2. Provide guidance on the best approach to add it
  3. Review our proposed API design
  4. Accept contributions if we implement this ourselves

We're happy to contribute code, write tests, and help maintain these features.


Related: The grammar sampler crashes are a secondary issue - we only discovered them because we tried to work around the missing tool
calling APIs. The primary ask is for proper tool calling bindings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions