GitHub - pyk/bench: Fast & Accurate Microbenchmarking for Zig

Fast & Accurate Benchmarking for Zig

Features

CPU Counters: Measures CPU cycles, instructions, IPC, and cache misses directly from the kernel (Linux only).
Argument Support: Pass pre-calculated data to your functions to separate setup overhead from the benchmark loop.
Baseline Comparison: Easily compare multiple implementations against a reference function to see relative speedups or regressions.
Flexible Reporting: Access raw metric data programmatically to generate custom reports (JSON, CSV) or assert performance limits in CI.
Easy Throughput Metrics: Automatically calculates operations per second and data throughput (MB/s, GB/s) when payload size is provided.
Robust Statistics: Uses median and standard deviation to provide reliable metrics despite system noise.
Zero Dependencies: Implemented in pure Zig using only the standard library.

Installation

Fetch latest version:

zig fetch --save=bench https://github.com/pyk/bench/archive/main.tar.gz

Then add this to your build.zig:

const bench = b.dependency("bench", .{
    .target = target,
    .optimize = optimize,
});

// Use it on a module
const mod = b.createModule(.{
    .target = target,
    .optimize = optimize,
    .imports = &.{
        .{ .name = "bench", .module = bench.module("bench") },
    },
});

// Or executable
const my_bench = b.addExecutable(.{
    .name = "my-bench",
    .root_module = b.createModule(.{
        .root_source_file = b.path("bench/my-bench.zig"),
        .target = target,
        .optimize = .ReleaseFast,
        .imports = &.{
            .{ .name = "bench", .module = bench.module("bench") },
        },
    }),
});

If you are using it only for tests/benchmarks, it is recommended to mark it as lazy:

.dependencies = .{
    .bench = .{
        .url = "...",
        .hash = "...",
        .lazy = true, // here
    },
}

Usage

Basic Run

To benchmark a single function, pass the allocator, a name, and the function pointer to run.

const res = try bench.run(allocator, "My Function", myFn, .{});
try bench.report(.{ .metrics = &.{res} });

Run with Arguments

You can generate test data before the benchmark starts and pass it via a tuple. This ensures the setup cost doesn't pollute your measurements.

// Setup data outside the benchmark
const input = try generateLargeString(allocator, 10_000);

// Pass input as a tuple
const res = try bench.run(allocator, "Parser", parseFn, .{input}, .{});

Comparing Implementations

You can run multiple benchmarks and compare them against a baseline. The baseline_index determines which result is used as the reference (1.00x).

const a = try bench.run(allocator, "Implementation A", implA, .{});
const b = try bench.run(allocator, "Implementation B", implB, .{});

try bench.report(.{
    .metrics = &.{ a, b },
    // Use the first metric (Implementation A) as the baseline
    .baseline_index = 0,
});

Measuring Throughput

If your function processes data (like copying memory or parsing strings), provide bytes_per_op to get throughput metrics (MB/s or GB/s).

const size = 1024 * 1024;
const res = try bench.run(allocator, "Memcpy 1MB", copyFn, .{
    .bytes_per_op = size,
});

// Report will now show GB/s instead of just Ops/s
try bench.report({ .metrics = &.{res} });

Configuration

You can tune the benchmark behavior by modifying the Options struct.

const res = try bench.run(allocator, "Heavy Task", heavyFn, .{
    .warmup_iters = 10,     // Default: 100
    .sample_size = 50,      // Default: 1000
});

Built-in Reporter

The default bench.report prints a human-readable table to stdout. It handles units (ns, us, ms, s) and coloring automatically.

$ zig build quicksort
Benchmarking Sorting Algorithms Against Random Input (N=10000)...
Benchmark Summary: 3 benchmarks run
├─ Unsafe Quicksort (Lomuto)   358.64us    110.98MB/s   1.29x faster
│  └─ cycles: 1.6M      instructions: 1.2M      ipc: 0.75       miss: 65
├─ Unsafe Quicksort (Hoare)    383.02us    104.32MB/s   1.21x faster
│  └─ cycles: 1.7M      instructions: 1.3M      ipc: 0.76       miss: 56
└─ std.mem.sort                462.25us     86.45MB/s   [baseline]
   └─ cycles: 2.0M      instructions: 2.6M      ipc: 1.30       miss: 143

Custom Reporter

The run function returns a Metrics struct containing all raw data (min, max, median, variance, cycles, etc.). You can use this to generate JSON, CSV, or assert performance limits in CI.

const metrics = try bench.run(allocator, "MyFn", myFn, .{});

// Access raw fields directly
std.debug.print("Median: {d}ns, Max: {d}ns\n", .{
    metrics.median_ns,
    metrics.max_ns
});

Supported Metrics

The run function returns a Metrics struct containing the following data points:

Category	Metric	Description
Meta	`name`	The identifier string for the benchmark.
Time	`min_ns`	Minimum execution time per operation (nanoseconds).
Time	`max_ns`	Maximum execution time per operation (nanoseconds).
Time	`mean_ns`	Arithmetic mean execution time (nanoseconds).
Time	`median_ns`	Median execution time (nanoseconds).
Time	`std_dev_ns`	Standard deviation of the execution time.
Meta	`samples`	Total number of measurement samples collected.
Throughput	`ops_sec`	Calculated operations per second.
Throughput	`mb_sec`	Data throughput in MB/s (populated if `bytes_per_op` > 0).
Hardware*	`cycles`	Average CPU cycles per operation.
Hardware*	`instructions`	Average CPU instructions executed per operation.
Hardware*	`ipc`	Instructions Per Cycle (efficiency ratio).
Hardware*	`cache_misses`	Average cache misses per operation.

*Hardware metrics are currently available on Linux only. They will be null on other platforms or if permissions are restricted.

Notes

This library is designed to show you "what", not "why". I recommend using a proper profiling tool such as perf on linux + Firefox Profiler to answer "why".

doNotOptimizeAway is your friend. For example if you are benchmarking some scanner/tokenizer:

  while (true) {
      const token = try scanner.next();
      if (token == .end) break;
      total_ops += 1;
      std.mem.doNotOptimizeAway(token); // CRITICAL
  }

To get cycles, instructions, ipc (instructions per cycle) and cache_misses metrics on Linux, you may need to enable the kernel.perf_event_paranoid.

Prior Art

Development

Install the Zig toolchain via mise (optional):

mise trust
mise install

Run tests:

zig build test --summary all

Build library:

zig build

Enable/disable kernel.perf_event_paranoid for debugging:

# Disable
sudo sysctl -w kernel.perf_event_paranoid=2

# Enable
sudo sysctl -w kernel.perf_event_paranoid=-1

Devlog

License

MIT. Use it for whatever.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github		.github
.vscode		.vscode
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon
mise.toml		mise.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features

Installation

Usage

Basic Run

Run with Arguments

Comparing Implementations

Measuring Throughput

Configuration

Built-in Reporter

Custom Reporter

Supported Metrics

Notes

Prior Art

Development

Devlog

License

About

Uh oh!

Languages

License

pyk/bench

Folders and files

Latest commit

History

Repository files navigation

Features

Installation

Usage

Basic Run

Run with Arguments

Comparing Implementations

Measuring Throughput

Configuration

Built-in Reporter

Custom Reporter

Supported Metrics

Notes

Prior Art

Development

Devlog

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages