AI Agent Analytics

A collection of tools for benchmarking, evaluating, and analyzing agent performance metrics.

Repository Structure

/benchmarks: Tools for running benchmarks on AI models and agents
/metrics_core: Core utilities and shared code
/evaluation: Tools for evaluating agent or model performance
/metrics_cli: Command-line interface
/metrics_service: Web service API
/dashboard: Analytics Dashboard

Primary Use Cases

1. Collect Metrics from Agent Runs

How to collect logs from NEAR AI Hub

2. Process Metrics

Transform, tune, aggregate, create csv table.

# Installation
python3.11 -m venv .venv
source .venv/bin/activate
cd metrics_core && pip install -e . && cd ..
cd evaluation && pip install -e . && cd ..
cd metrics_cli && pip install -e . && cd ..
cd metrics_service && pip install -e . && cd ..

# Transform and aggregate metrics
metrics-cli tune /path/to/logs /path/to/tuned_logs --rename --ms-to-s
metrics-cli aggregate /path/to/tuned_logs /path/to/aggr_logs --filters "runner:not_in:local" --slices "agent_name"
metrics-cli aggregation-table /Users/me/.nearai/tuned_logs /Users/me/.nearai/table --filters "runner:not_in:local" --absent-metrics-strategy=nullify

3. Query Metrics via API

Run the metrics service to query and analyze metrics data:

# Start the metrics service
metrics-service --metrics-path /path/to/tuned_logs

# Query metrics via API
curl -X POST "http://localhost:8000/api/v1/table/aggregation" \
  -H "Content-Type: application/json" \
  -d '{
    "filters": ["runner:not_in:local"],
    "column_selections": ["/metrics/performance/"]
  }'

3a. 🐳 Run with Docker

The easiest way to run the metrics service with automated LiveBench data:

# Quick start with Docker Compose
docker compose up -d

# Access the service
open http://localhost:8000/api/v1/docs

This provides:

✅ Automated LiveBench leaderboard scraping on startup and daily
✅ Persistent data storage
✅ Health monitoring and auto-restart

4. Run Dashboard

Run a web application for querying and visualizing analytics metrics data. Features comprehensive views for time series analysis, data tables, and chronological logs.

cd dashboard
npm install
npm run build-css
npm start

Will open a dashboard at http://localhost:3000

The dashboard can also be used as a web component in other applications:

npm install @nearai/analytics-dashboard

import { Dashboard } from '@nearai/analytics-dashboard';

// Use with configuration
<Dashboard config={{
  views: ['timeseries', 'table', 'model_comparison', 'error_logs'],
  globalFilters: ['runner:not_in:local'], // Applied to all requests
  viewConfigs: {
    timeseries: {
      view_type: 'timeseries',
      view_name: 'Time Series',
      metricSelection: 'PERFORMANCE',
      defaultParameters: {
        time_filter: '1 month',
        time_granulation: '1 day'
      },
      timeFilterRecommendations: ['last hour', 'last day', 'last week', 'last month', 'last year'],
      refreshRate: 30 // Refresh every 30 seconds
    },
    table: {
      view_type: 'table',
      view_name: 'Table',
      metricSelection: 'CUSTOM',
      showParameters: ['prune_mode'], // Show only specific parameters
      refreshRate: 30 // Refresh every 30 seconds
    },
    model_comparison: {
      view_type: 'table',
      view_name: 'Compare Models',
      metricSelection: 'COMPARE_MODELS',
      refreshRate: undefined // No refreshing
    },
    logs_errors: {
      view_type: 'logs',
      view_name: 'Error Logs',
      metricSelection: 'ERROR',
      timeFilterRecommendations: []  // Default: disable
    }
  }
}} />

5. Run Benchmarks and Evaluations

Execute popular and user-owned benchmarks to generate performance metrics. Run audit evaluations on agents.

6. Run Evaluation Dashboard

Visualize, analyze, and compare agent & model performances using the collected metrics.

cd dashboard
npm install
npm run build-css
npm start

Will open a dashboard. Go to "Compare Models" view.

Key Features

🐳 Docker Support: One-command deployment with automated LiveBench data scraping
Canonical Metrics Format: Standardized format for consistent metrics across all agents
Flexible Aggregation: Group and aggregate metrics by various dimensions
Powerful Filtering: Filter metrics by runner, model, time ranges, and custom criteria
RESTful API: Easy integration with dashboards and other tools
Performance Tracking: Monitor latency, API usage, error rates, and custom metrics

Contributing

We welcome contributions! See individual component READMEs for specific development guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 416 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
dashboard-test-app		dashboard-test-app
dashboard		dashboard
evaluation		evaluation
integrations		integrations
metrics_cli		metrics_cli
metrics_core		metrics_core
metrics_service		metrics_service
scripts		scripts
.codespellrc		.codespellrc
.dockerignore		.dockerignore
.gitignore		.gitignore
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
codespell_ignore.txt		codespell_ignore.txt
docker-compose.yml		docker-compose.yml
k8s-manifest.yaml		k8s-manifest.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Agent Analytics

Repository Structure

Primary Use Cases

1. Collect Metrics from Agent Runs

2. Process Metrics

3. Query Metrics via API

3a. 🐳 Run with Docker

4. Run Dashboard

5. Run Benchmarks and Evaluations

6. Run Evaluation Dashboard

Key Features

Contributing

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

nearai/analytics

Folders and files

Latest commit

History

Repository files navigation

AI Agent Analytics

Repository Structure

Primary Use Cases

1. Collect Metrics from Agent Runs

2. Process Metrics

3. Query Metrics via API

3a. 🐳 Run with Docker

4. Run Dashboard

5. Run Benchmarks and Evaluations

6. Run Evaluation Dashboard

Key Features

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages