A comprehensive platform combining LLM cost tracking with quantum-inspired task planning. This self-hostable solution captures token, latency, and cost data from LangChain and LiteLLM while providing advanced task scheduling using quantum computing concepts like superposition, entanglement, and quantum annealing.
- OpenTelemetry collector and rules engine for LLM operations
- Real-time cost monitoring with Postgres storage and Grafana visualization
- Budget-aware model switching using Vellum's price catalog
- Quantum-inspired scheduling with superposition, entanglement, and interference patterns
- High-performance optimization using quantum annealing algorithms
- Enterprise-grade scalability with auto-scaling and load balancing
Feature | Details |
---|---|
Real-time Metering | Asynchronous Python middleware hooks into LangChain's AsyncIteratorCallbackHandler to capture token usage, latency, prompts, and the specific model used. |
Budget Rules Engine | YAML rules (monthly_budget , swap_threshold ) trigger automatic model routing via LiteLLM router or Vellum API, based on up-to-date model prices. |
Dashboards | Comes with a pre-built Grafana JSON dashboard located in /dashboards/llm-cost-dashboard.json (UID: llm-cost-dashboard ) to visualize costs by application, model, and user. |
Alerting | Integrates with Prometheus to send alerts to Slack or OpsGenie whenever predefined cost thresholds are exceeded. |
Pluggable Storage | Defaults to Postgres, with adapters available for ClickHouse and BigQuery to offer flexibility in data storage. |
Feature | Details |
---|---|
Quantum Scheduling | Tasks exist in superposition states, allowing for probabilistic execution planning and optimal resource allocation using quantum annealing algorithms. |
Task Entanglement | Related tasks can be quantum-entangled, ensuring coordinated execution and maintaining dependencies through quantum interference patterns. |
Performance Optimization | High-performance caching with LRU eviction, load balancing with circuit breakers, and auto-scaling based on queue utilization and resource metrics. |
Global Compliance | Built-in GDPR/CCPA compliance with PII detection, data anonymization, consent management, and data subject rights (right to access, delete, portability). |
Multilingual Support | Native internationalization (i18n) with support for 6 languages: English, Spanish, French, German, Japanese, and Chinese (Simplified). |
Production Ready | Zero-downtime deployments, comprehensive monitoring with Grafana/Prometheus, automated backups, security scanning, and enterprise-grade reliability. |
LangChain β Cost-Middleware β OpenTelemetry SDK β OTLP Collector β Postgres β Grafana
β Prometheus/Alertmanager
Tasks β Quantum Planner β Annealing Optimizer β Load Balancer β Execution
β β β β β
Cache Monitoring Auto-Scaler Circuit Breakers Results
ββββ LLM Cost Tracker ββββ
β β
LangChain βββββββΌββ OpenTelemetry βββββββΌββββ Postgres
β β β β
β Prometheus ββββββββββΌββββββ Grafana
β β
Tasks βββββββββββΌββ Quantum Planner ββββΌββββ Execution
β β β β
β Monitoring ββββββββββΌββββββ Results
β β
βββββββββββββββββββββββββ
# Clone repository
git clone https://github.com/terragon-labs/llm-cost-tracker
cd llm-cost-tracker
# Configure production environment
cp .env.production.example .env.production
# Edit .env.production with your settings
# Deploy with zero-downtime
chmod +x scripts/deploy.sh
./scripts/deploy.sh deploy
# Access services
# API: https://api.your-domain.com
# Grafana: https://grafana.your-domain.com
# Quantum Dashboard: https://api.your-domain.com/api/v1/quantum/system/state
# Clone and setup
git clone https://github.com/terragon-labs/llm-cost-tracker
cd llm-cost-tracker
# Start services
docker compose up -d
# Install dependencies
poetry install
# Run LLM cost tracking demo
python examples/streamlit_demo.py
# Test quantum task planner
curl -X GET http://localhost:8000/api/v1/quantum/demo
# Access Grafana and import dashboard
# http://localhost:3000 (admin/admin)
# Import: /dashboards/llm-cost-dashboard.json
This tool handles sensitive API keys. To safeguard these credentials, we follow an encrypted proxy pattern. All keys should be stored in environment variables or a secure vault. For reporting vulnerabilities, please refer to our organization's SECURITY.md
file.
- API Reference - Complete API documentation with examples
- Quantum Architecture - Deep dive into quantum-inspired concepts
- Deployment Guide - Production deployment instructions
- Examples - Sample implementations and use cases
from llm_cost_tracker import QuantumTaskPlanner, QuantumTask, ResourcePool
# Initialize planner
planner = QuantumTaskPlanner()
# Create tasks
task1 = QuantumTask(
id="analyze_data",
name="Data Analysis",
priority=9.0,
estimated_duration_minutes=30
)
# Add to planner
planner.add_task(task1)
# Generate optimal schedule
schedule = planner.generate_schedule()
print(f"Optimal execution order: {schedule}")
# Execute tasks
results = await planner.execute_schedule_async(schedule)
# Create a quantum task
curl -X POST http://localhost:8000/api/v1/quantum/tasks \
-H "Content-Type: application/json" \
-d '{
"id": "task_001",
"name": "Machine Learning Pipeline",
"priority": 8.5,
"estimated_duration_minutes": 45
}'
# Generate optimal schedule
curl -X GET http://localhost:8000/api/v1/quantum/schedule
# Monitor system state
curl -X GET http://localhost:8000/api/v1/quantum/system/state
- v0.1.0: β Core tracing, Grafana dashboard, and Prometheus alerts
- v0.2.0: Implementation of the budget-aware model swapper with Slack alerts
- v0.3.0: Introduction of multi-tenant RBAC and per-project budgets
- v0.1.0: β Quantum-inspired scheduling with superposition and entanglement
- v0.1.0: β Performance optimization with caching and load balancing
- v0.1.0: β Global compliance (GDPR/CCPA) and i18n support
- v0.2.0: Advanced quantum algorithms and machine learning integration
- v0.3.0: Distributed quantum planning across multiple nodes
We welcome contributions! Please see our organization-wide CONTRIBUTING.md
for guidelines and our CODE_OF_CONDUCT.md
. A CHANGELOG.md
is maintained for version history.
- lang-observatory: Integrates these cost metrics into a unified observability stack.
- eval-genius-agent-bench: Uses this tracker to overlay cost data on performance evaluations.
This project is licensed under the Apache-2.0 License. It incorporates functionalities inspired by Helicone, which is licensed under the MIT License. A copy of relevant downstream licenses can be found in the LICENSES/
directory.
- LangChain Callbacks: AsyncIteratorCallbackHandler Docs
- Vellum LLM Cost Comparison: Vellum AI Blog
- Helicone: Official Site