Skip to content

A self-hostable OpenTelemetry collector and rules engine designed to capture token, latency, and cost data from LangChain and LiteLLM callbacks.

License

Notifications You must be signed in to change notification settings

danieleschmidt/llm-cost-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

98 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

llm-cost-tracker

Build Status Coverage Status License Version

A comprehensive platform combining LLM cost tracking with quantum-inspired task planning. This self-hostable solution captures token, latency, and cost data from LangChain and LiteLLM while providing advanced task scheduling using quantum computing concepts like superposition, entanglement, and quantum annealing.

🌟 Dual-Purpose Platform

πŸ“Š LLM Cost Tracking

  • OpenTelemetry collector and rules engine for LLM operations
  • Real-time cost monitoring with Postgres storage and Grafana visualization
  • Budget-aware model switching using Vellum's price catalog

βš›οΈ Quantum Task Planning

  • Quantum-inspired scheduling with superposition, entanglement, and interference patterns
  • High-performance optimization using quantum annealing algorithms
  • Enterprise-grade scalability with auto-scaling and load balancing

✨ Key Features

πŸ“Š LLM Cost Tracking Features

Feature Details
Real-time Metering Asynchronous Python middleware hooks into LangChain's AsyncIteratorCallbackHandler to capture token usage, latency, prompts, and the specific model used.
Budget Rules Engine YAML rules (monthly_budget, swap_threshold) trigger automatic model routing via LiteLLM router or Vellum API, based on up-to-date model prices.
Dashboards Comes with a pre-built Grafana JSON dashboard located in /dashboards/llm-cost-dashboard.json (UID: llm-cost-dashboard) to visualize costs by application, model, and user.
Alerting Integrates with Prometheus to send alerts to Slack or OpsGenie whenever predefined cost thresholds are exceeded.
Pluggable Storage Defaults to Postgres, with adapters available for ClickHouse and BigQuery to offer flexibility in data storage.

βš›οΈ Quantum Task Planning Features

Feature Details
Quantum Scheduling Tasks exist in superposition states, allowing for probabilistic execution planning and optimal resource allocation using quantum annealing algorithms.
Task Entanglement Related tasks can be quantum-entangled, ensuring coordinated execution and maintaining dependencies through quantum interference patterns.
Performance Optimization High-performance caching with LRU eviction, load balancing with circuit breakers, and auto-scaling based on queue utilization and resource metrics.
Global Compliance Built-in GDPR/CCPA compliance with PII detection, data anonymization, consent management, and data subject rights (right to access, delete, portability).
Multilingual Support Native internationalization (i18n) with support for 6 languages: English, Spanish, French, German, Japanese, and Chinese (Simplified).
Production Ready Zero-downtime deployments, comprehensive monitoring with Grafana/Prometheus, automated backups, security scanning, and enterprise-grade reliability.

πŸ—οΈ Reference Architecture

LLM Cost Tracking Flow

LangChain ↔ Cost-Middleware β†’ OpenTelemetry SDK β†’ OTLP Collector β†’ Postgres β†’ Grafana
                                    β†˜ Prometheus/Alertmanager

Quantum Task Planning Flow

Tasks β†’ Quantum Planner β†’ Annealing Optimizer β†’ Load Balancer β†’ Execution
   ↓         ↓                    ↓                   ↓            ↓
Cache    Monitoring        Auto-Scaler       Circuit Breakers   Results

Integrated System Architecture

                    β”Œβ”€β”€β”€ LLM Cost Tracker ───┐
                    β”‚                        β”‚
    LangChain ──────┼── OpenTelemetry ──────┼──── Postgres
                    β”‚       β”‚               β”‚       β”‚
                    β”‚   Prometheus ─────────┼────── Grafana
                    β”‚                       β”‚
    Tasks ──────────┼── Quantum Planner ───┼──── Execution
                    β”‚       β”‚               β”‚       β”‚
                    β”‚   Monitoring ─────────┼────── Results
                    β”‚                       β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

⚑ Quick Start

🐳 Production Deployment (Recommended)

# Clone repository
git clone https://github.com/terragon-labs/llm-cost-tracker
cd llm-cost-tracker

# Configure production environment
cp .env.production.example .env.production
# Edit .env.production with your settings

# Deploy with zero-downtime
chmod +x scripts/deploy.sh
./scripts/deploy.sh deploy

# Access services
# API: https://api.your-domain.com
# Grafana: https://grafana.your-domain.com  
# Quantum Dashboard: https://api.your-domain.com/api/v1/quantum/system/state

πŸ”¬ Development Setup

# Clone and setup
git clone https://github.com/terragon-labs/llm-cost-tracker
cd llm-cost-tracker

# Start services
docker compose up -d

# Install dependencies
poetry install

# Run LLM cost tracking demo
python examples/streamlit_demo.py

# Test quantum task planner
curl -X GET http://localhost:8000/api/v1/quantum/demo

# Access Grafana and import dashboard
# http://localhost:3000 (admin/admin)
# Import: /dashboards/llm-cost-dashboard.json

πŸ” Security

This tool handles sensitive API keys. To safeguard these credentials, we follow an encrypted proxy pattern. All keys should be stored in environment variables or a secure vault. For reporting vulnerabilities, please refer to our organization's SECURITY.md file.

πŸ“š Documentation

πŸš€ API Examples

Quantum Task Planning

from llm_cost_tracker import QuantumTaskPlanner, QuantumTask, ResourcePool

# Initialize planner
planner = QuantumTaskPlanner()

# Create tasks
task1 = QuantumTask(
    id="analyze_data",
    name="Data Analysis",
    priority=9.0,
    estimated_duration_minutes=30
)

# Add to planner
planner.add_task(task1)

# Generate optimal schedule
schedule = planner.generate_schedule()
print(f"Optimal execution order: {schedule}")

# Execute tasks
results = await planner.execute_schedule_async(schedule)

REST API Usage

# Create a quantum task
curl -X POST http://localhost:8000/api/v1/quantum/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "id": "task_001",
    "name": "Machine Learning Pipeline", 
    "priority": 8.5,
    "estimated_duration_minutes": 45
  }'

# Generate optimal schedule
curl -X GET http://localhost:8000/api/v1/quantum/schedule

# Monitor system state
curl -X GET http://localhost:8000/api/v1/quantum/system/state

πŸ“ˆ Roadmap

LLM Cost Tracker

  • v0.1.0: βœ… Core tracing, Grafana dashboard, and Prometheus alerts
  • v0.2.0: Implementation of the budget-aware model swapper with Slack alerts
  • v0.3.0: Introduction of multi-tenant RBAC and per-project budgets

Quantum Task Planner

  • v0.1.0: βœ… Quantum-inspired scheduling with superposition and entanglement
  • v0.1.0: βœ… Performance optimization with caching and load balancing
  • v0.1.0: βœ… Global compliance (GDPR/CCPA) and i18n support
  • v0.2.0: Advanced quantum algorithms and machine learning integration
  • v0.3.0: Distributed quantum planning across multiple nodes

🀝 Contributing

We welcome contributions! Please see our organization-wide CONTRIBUTING.md for guidelines and our CODE_OF_CONDUCT.md. A CHANGELOG.md is maintained for version history.

See Also

πŸ“ Licenses & Attribution

This project is licensed under the Apache-2.0 License. It incorporates functionalities inspired by Helicone, which is licensed under the MIT License. A copy of relevant downstream licenses can be found in the LICENSES/ directory.

πŸ“š References

About

A self-hostable OpenTelemetry collector and rules engine designed to capture token, latency, and cost data from LangChain and LiteLLM callbacks.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages