Releases: BerriAI/litellm
v1.75.7-nightly
What's Changed
- [Proxy] LiteLLM mock test fix by @jugaldb in #13635
- [Proxy] Litellm add DB metrics to prometheus by @jugaldb in #13626
- [LLM Translation] Fix Realtime API endpoint for no intent by @jugaldb in #13476
- [MCP Gateway] LiteLLM Fix MCP gateway key auth by @jugaldb in #13630
- [Fix] Ensure /messages works when using `bedrock/converse/ with LiteLLM by @ishaan-jaff in #13627
- UI - Fix image overflow in LiteLLM model by @ishaan-jaff in #13639
- [Bug Fix] /messages endpoint - ensure tool use arguments are returned for non-anthropic models by @ishaan-jaff in #13638
Full Changelog: v1.75.6-nightly...v1.75.7-nightly
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.75.7-nightly
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 140.0 | 182.49004836898376 | 6.247441314306712 | 0.0 | 1870 | 0 | 114.26430999995318 | 2038.3160259999613 |
Aggregated | Passed ✅ | 140.0 | 182.49004836898376 | 6.247441314306712 | 0.0 | 1870 | 0 | 114.26430999995318 | 2038.3160259999613 |
v1.75.6-nightly
What's Changed
- [Bug Fix] - Allow using
reasoning_effort
for gpt-5 model family andreasoning
for Responses API by @ishaan-jaff in #13475 - [Bug Fix]: Azure OpenAI GPT-5 max_tokens +
reasoning
param support by @ishaan-jaff in #13510 - [Draft] [LLM Translation] Add model id check by @jugaldb in #13507
- [Docs] - Document how to sending tags with Litellm Python SDK Calls to LiteLM Proxy by @ishaan-jaff in #13517
- Fix OCI streaming by @breno-aumo in #13437
- feat: add CometAPI provider support with chat completions and streaming by @TensorNull in #13458
- Allow unsetting TPM and RPM - Teams Settings by @NANDINI-star in #13430
- [Feat] - Add key/team logging for Langfuse OTEL Logger by @ishaan-jaff in #13512
- [Feat] Add Streaming support + Docs for bedrock gpt-oss model family by @ishaan-jaff in #13346
- [Feat] GEMINI CLI Integration - Add /countTokens endpoint support by @ishaan-jaff in #13545
- Feat/sambanova embeddings by @jhpiedrahitao in #13308
- Display Error from Backend on the UI - Keys Page by @NANDINI-star in #13435
- Team Member Permissions Page - Access Column Changes by @NANDINI-star in #13145
- Fix internal users table overflow by @NANDINI-star in #12736
- Enhance chart readability with short-form notation for large numbers by @NANDINI-star in #12370
- [Bug fix] SCIM Team Memberships - handle metadata by @ishaan-jaff in #13553
- [Feat] GEMINI CLI - Add Token Counter for VertexAI Models by @ishaan-jaff in #13558
- [Feat] Add CredentialDeleteModal component and integrate with CredentialsPanel by @jugaldb in #13550
- Implement GitHub Action to auto-label issues with provider keywords by @kankute-sameer in #13537
- LiteLLM SDK <-> Proxy: support
user
param + Prisma - removeuse_prisma_migrate
flag - redundant as this is now default by @krrishdholakia in #13555 - [Fix] Streaming - consistent 'finish_reason' chunk index by @krrishdholakia in #13560
- [Fix] Hide sensitive data in /model/info - azure entra client_secret by @MajorD00m in #13577
- Fix Ollama GPT-OSS streaming with 'thinking' field by @colesmcintosh in #13375
- fix(azure): remove trailing semicolon in Content-Type header for image generation by @VerunicaM in #13584
- Remove ambiguous network response error by @NANDINI-star in #13582
- [fix] Enhance MCPServerManager with access groups and description support by @jugaldb in #13549
- [Feat] New model
vertex_ai/deepseek-ai/deepseek-r1-0528-maas
by @ishaan-jaff in #13594 - [Docs] Update build from pip docs - new prisma migrate by @ishaan-jaff in #13603
- [Feat] New provider - Azure AI Flux Image Generation by @ishaan-jaff in #13592
- [Feat] Team Member Rate Limits + Support for using with JWT Auth by @ishaan-jaff in #13601
- Fix e2e_ui_testing by @NANDINI-star in #13610
- fix(volcengine): handle thinking disabled parameter properly by @colesmcintosh in #13598
- [Feat] Add
reasoning_effort
param for hosted_vllm provider by @ishaan-jaff in #13620 - perf(main.py): new 'EXPERIMENTAL_OPENAI_BASE_LLM_HTTP_HANDLER' flag (+100 rps improvement on openai calls) by @krrishdholakia in #13625
- Add deepseek-chat-v3-0324 to OpenRouter cost map by @huangyafei in #13607
- [LLM Translation/Proxy] Fix - add safe divide by 0 for most places to prevent crash by @jugaldb in #13624
- [LLM translation] Refactor Anthropic Configurations and Add Support for
anthropic_beta
Headers by @jugaldb in #13590 - [Management/UI]Allow routes for admin viewer by @jugaldb in #13588
- [Proxy] Litellm fix mapped tests by @jugaldb in #13634
- Update mlflow logger usage span attributes by @TomeHirata in #13561
New Contributors
- @TensorNull made their first contribution in #13458
- @MajorD00m made their first contribution in #13577
- @VerunicaM made their first contribution in #13584
- @huangyafei made their first contribution in #13607
- @TomeHirata made their first contribution in #13561
Full Changelog: v1.75.5.rc.1...v1.75.6-nightly
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.75.6-nightly
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 110.0 | 152.45377470174589 | 6.50695562099742 | 0.0 | 1948 | 0 | 86.13810599996441 | 2202.5806519999946 |
Aggregated | Passed ✅ | 110.0 | 152.45377470174589 | 6.50695562099742 | 0.0 | 1948 | 0 | 86.13810599996441 | 2202.5806519999946 |
litellm_v1.75.5-dev_memory_fix_2
Full Changelog: litellm_v1.75.5-dev_memory_fix...litellm_v1.75.5-dev_memory_fix_2
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-litellm_v1.75.5-dev_memory_fix_2
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 110.0 | 148.3727584527906 | 6.403502625001942 | 0.0 | 1917 | 0 | 82.2168479999732 | 1009.9184229999878 |
Aggregated | Passed ✅ | 110.0 | 148.3727584527906 | 6.403502625001942 | 0.0 | 1917 | 0 | 82.2168479999732 | 1009.9184229999878 |
litellm_v1.73.0-dev_memory_fix_2
Full Changelog: litellm_v1.73.0-dev_memory_fix_1...litellm_v1.73.0-dev_memory_fix_2
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-litellm_v1.73.0-dev_memory_fix_2
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 160.0 | 197.89495319271438 | 6.333208452490925 | 0.0 | 1894 | 0 | 120.76194100001203 | 1698.8860899999736 |
Aggregated | Passed ✅ | 160.0 | 197.89495319271438 | 6.333208452490925 | 0.0 | 1894 | 0 | 120.76194100001203 | 1698.8860899999736 |
litellm_v1.73.0-dev_memory_fix_1
Full Changelog: litellm_v1.73.0-dev_memory_fix...litellm_v1.73.0-dev_memory_fix_1
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-litellm_v1.73.0-dev_memory_fix_1
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 130.0 | 168.52018052761707 | 6.353625704455274 | 0.0 | 1901 | 0 | 102.81331199996657 | 955.3499159999888 |
Aggregated | Passed ✅ | 130.0 | 168.52018052761707 | 6.353625704455274 | 0.0 | 1901 | 0 | 102.81331199996657 | 955.3499159999888 |
litellm_v1.73.0-dev_memory_fix
Full Changelog: v1.73.0.rc.1...litellm_v1.73.0-dev_memory_fix
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-litellm_v1.73.0-dev_memory_fix
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 170.0 | 201.69670727064485 | 6.271320946816614 | 0.0 | 1877 | 0 | 124.88003300001083 | 1273.2718879999823 |
Aggregated | Passed ✅ | 170.0 | 201.69670727064485 | 6.271320946816614 | 0.0 | 1877 | 0 | 124.88003300001083 | 1273.2718879999823 |
v1.75.5.dev3
What's Changed
- [Bug Fix] - Allow using
reasoning_effort
for gpt-5 model family andreasoning
for Responses API by @ishaan-jaff in #13475 - [Bug Fix]: Azure OpenAI GPT-5 max_tokens +
reasoning
param support by @ishaan-jaff in #13510 - [Draft] [LLM Translation] Add model id check by @jugaldb in #13507
- [Docs] - Document how to sending tags with Litellm Python SDK Calls to LiteLM Proxy by @ishaan-jaff in #13517
- Fix OCI streaming by @breno-aumo in #13437
- feat: add CometAPI provider support with chat completions and streaming by @TensorNull in #13458
- Allow unsetting TPM and RPM - Teams Settings by @NANDINI-star in #13430
- [Feat] - Add key/team logging for Langfuse OTEL Logger by @ishaan-jaff in #13512
- [Feat] Add Streaming support + Docs for bedrock gpt-oss model family by @ishaan-jaff in #13346
New Contributors
- @TensorNull made their first contribution in #13458
Full Changelog: v1.75.5.rc.1...v1.75.5.dev3
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.75.5.dev3
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 170.0 | 209.8803997534473 | 6.313294138003075 | 0.0 | 1886 | 0 | 126.72262299997783 | 1268.0979020000223 |
Aggregated | Passed ✅ | 170.0 | 209.8803997534473 | 6.313294138003075 | 0.0 | 1886 | 0 | 126.72262299997783 | 1268.0979020000223 |
litellm_v1.75.5-dev_memory_fix_1
What's Changed
- [Bug Fix] - Allow using
reasoning_effort
for gpt-5 model family andreasoning
for Responses API by @ishaan-jaff in #13475 - [Bug Fix]: Azure OpenAI GPT-5 max_tokens +
reasoning
param support by @ishaan-jaff in #13510 - [Draft] [LLM Translation] Add model id check by @jugaldb in #13507
Full Changelog: v1.75.5.rc.1...litellm_v1.75.5-dev_memory_fix_1
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-litellm_v1.75.5-dev_memory_fix_1
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 180.0 | 211.1796445664915 | 6.382692480357281 | 0.0 | 1910 | 0 | 132.33785100001683 | 1892.3347159999935 |
Aggregated | Passed ✅ | 180.0 | 211.1796445664915 | 6.382692480357281 | 0.0 | 1910 | 0 | 132.33785100001683 | 1892.3347159999935 |
litellm_v1.75.5-dev_memory_fix
What's Changed
- [Bug Fix] - Allow using
reasoning_effort
for gpt-5 model family andreasoning
for Responses API by @ishaan-jaff in #13475 - [Bug Fix]: Azure OpenAI GPT-5 max_tokens +
reasoning
param support by @ishaan-jaff in #13510 - [Draft] [LLM Translation] Add model id check by @jugaldb in #13507
Full Changelog: v1.75.5.rc.1...litellm_v1.75.5-dev_memory_fix
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-litellm_v1.75.5-dev_memory_fix
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-litellm_v1.75.5-dev_memory_fix
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 120.0 | 153.71541160805808 | 6.384305831136473 | 0.0 | 1911 | 0 | 80.34984599999007 | 1251.9617030000063 |
Aggregated | Passed ✅ | 120.0 | 153.71541160805808 | 6.384305831136473 | 0.0 | 1911 | 0 | 80.34984599999007 | 1251.9617030000063 |
v1.75.5.rc.1
What's Changed
- Litellm release notes 08 10 2025 by @krrishdholakia in #13479
- Litellm model cost map fixes by @krrishdholakia in #13480
- ui - build updated ui + increase max_tokens in health_check = 10 (gpt-5-nano throws error for max_token=1) by @krrishdholakia in #13482
Full Changelog: v1.75.5-stable.rc-draft...v1.75.5.rc.1
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.75.5.rc.1
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 140.0 | 177.87900723772194 | 6.396151577527124 | 0.0 | 1914 | 0 | 110.98393499997883 | 1386.0049980000042 |
Aggregated | Passed ✅ | 140.0 | 177.87900723772194 | 6.396151577527124 | 0.0 | 1914 | 0 | 110.98393499997883 | 1386.0049980000042 |