Dynamo 0.4.0 Release Notes

Dynamo is a high-performance, low-latency inference framework designed to serve generative AI models—across any framework, architecture, or deployment scale. It's an open-source project under the Apache 2.0 license. Dynamo is available for installation via pip wheels and containers from NVIDIA NGC.

As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:

NVIDIA TensorRT-LLM
vLLM
SGLang

Major Features and Improvements

Increasing Framework Support

vLLM Updates
- Added E2E integration tests (#1935) and multimodal example with Llama4 Maverick (#1990)
- Prefill-aware routing for improved performance (#1895)
- Configurable namespace support for vLLM examples (#1909)
- Routing via ApproxKvIndexer with use_kv_events flag (#1869)
- Updated all vLLM examples to new UX (#1756)
SGLang Updates
- Receive KV metrics from scheduler (#1789)
- Disaggregated deployment examples (#2137)
- Launch and deploy examples added (#2068)
TRT-LLM Updates
- New/speculative decoding example: Llama-4 + Eagle-3 (#1828)
Routing Performance
- Removed router hot-path lock for faster request handling (#1963)
- Added radix tree dumps as router events (#2057)

UX Updates

Migration to New Python UX
- Updated all Python launch flows to the new UX structure (#2003), including refactoring vLLM backend integration (#1983).
- Removed outdated examples that relied on the old UX (#1899).
CLI and Packaging Enhancements
- Added Python bindings for Dynamo CLI tools (#1799).
- Updated Python packaging to align with the new UX (#2054).
- Introduced a Python frontend/ingress node for easier deployment integration (#1912).
- Added a convenience script to uninstall Dynamo Deploy CRDs (#1933).
Kubernetes Deployment UX
- Enhanced Helm chart flexibility:
  - Added ability to override any podSpec property (#2116).
  - Enabled Helm upgrade via deploy script for smoother iteration (#1936).
  - Added Grove scheduling support to the graph Helm chart (#1954).
- Introduced Kubernetes deployment examples for vLLM, SGLang, and TRT-LLM (#2062, #2133).
- New Hello World Kubernetes deployment example (#1854).
Examples & Docs Overhaul
- Hello World Python binding example (#2083).
- Documentation updated for UX (#2070), reorganized example READMEs (#2174), and refactored core README structure (#2141).

Deployment, Kubernetes, and CLI

Helm and Graph Deployments
- Liveness/readiness probes in graph Helm chart (#1888)
- Added ability to override any podSpec property (#2116)
- Support for Grove scheduling in Helm (#1954)
Planner and Profiling
- Deploy SLA profiler and SLA planner to Kubernetes (#2030, #2135)

Performance and Observability

Structured Logging Improvements
- Enhanced structured JSONL logs with span start/close events, trace ID/span ID injection, duration formatting in microseconds, and improved context capture for distributed tracing workflows (PR #2061).
Tokenizer & Runtime
- De-tokenize performance improved by ~50% (#1868)
- Runtime now uses all available parallelism (#1858)
Metrics
- Hierarchical Prometheus metrics registry (#2008)
- Generic ingress handler metrics (#2090)

Bug Fixes

Fixed GPU resource specifications in LLM deployments (#1812)
Corrected vLLM, SGLang, and TRTLLM deployment issues, including container builds, runtime packaging, and helm chart updates (#1942, #2062, #1825)
Addressed port conflicts, deterministic port assignments, and health check improvements (#1937, #1996)
Improved error handling for empty message lists and invalid configurations (#2067, #2071)
Fixed nil pointer dereference issues in the Dynamo controller (#2299, #2335)
Locked dependencies to avoid breaking changes (e.g., Triton 3.4.0 w/ TRT-LLM 1.0.0) (#2233)

Documentation

Guides and Examples
- New hello world Python binding example (#2083)
- Added multinode, disaggregated, and Grove deployment guides (#2155, #2086)
- Added AKS/EKS deployment guides (#2080)
Docs Restructuring
- Updated for new Python UX (#2070)
- Refactored README and reorganized examples (#2141, #2174)

Build, CI, and Test

Added support for sGLang runtime image builds (#1770)
Optional TRTLLM dependency and custom build support (#2113)
New end-to-end router tests with mockers (#2073)
Fixed vLLM builds for Blackwell GPUs (#2020)

Release Assets

Python Wheels:

Rust Crates:

Containers:

Helm Charts:

Open Issues

x86 TRT-LLM container image not compatible out of the box with B200. Dev container still works for B200/GB200

Contributors

We welcome new contributors in this release:
@umang-kedia-hpe, @Ethan-ES, @messiaen, @galletas1712, @mc-nv, @zaristei, @jhaotingc, @saurabh-nvidia.

For the full list of changes, see the changelog.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamo Release v0.4.0

Dynamo 0.4.0 Release Notes

Major Features and Improvements

Increasing Framework Support

UX Updates

Deployment, Kubernetes, and CLI

Performance and Observability

Bug Fixes

Documentation

Build, CI, and Test

Release Assets

Open Issues

Contributors

Contributors

Uh oh!