Release OpenVINO Model Server 2025.2.1 · openvinotoolkit/model_server

The 2025.2.1 is a minor release with bug fixes and improvements, mainly in automatic model pulling and image generation.

Improvements:

Enable passing chat_template_kwargs parameters in chat/completion request. It can be used to turn off model reasoning.
Allow setting CORS headers in HTTP response. It can resolve connectivity problems between OpenWebUI and the model server.

Other changes:

Changed NPU driver version from 1.17 to 1.19 in docker images
Security related updates in dependencies

Bug fixes:

Removed limitation for Image generation - now it supports requesting several output images with parameter n
add_to_config and remove_from_config parameters accept path to configuration file in addition to directory containing config.json file
Resolved connectivity issues while pulling models from HuggingFace Hub without proxy configuration
Fixed handling HF_ENDPOINT environment variable with HTTP addresses as previously https:// prefix was incorrectly added.
Changed pull feature environment variables GIT_SERVER_CONNECT_TIMEOUT_MS to GIT_OPT_SET_SERVER_TIMEOUT and GIT_SERVER_TIMEOUT_MS to GIT_OPT_SET_SERVER_TIMEOUT to unify with underlying libgit2 implementation.
Fixed handling relative paths on Windows with MediaPipes/LLMs for config_path parameter.
Fixed agentic demo not working without proxy
Stop rejecting response_format field in image generation. While parameter accepts now only base64_json value, it allows to integrate with Open WebUI
Add missing --response_parser parameter when using OVMS to pull LLM's model and prepare its configuration
Block simultaneous use of --list_models and --pull parameters as they are exclusive.
Fixed accuracy for the Phi4-mini model response parser while using functions with lists as arguments
export_model.py script fix for handling target_device for embeddings and reranking models
stateful text generation pipeline do not include usage content - it is not supported for such pipeline type. Before it was returning incorrect response.

Known issues and limitations

VLM models QwenVL2, QwenVL2.5, and Phi3_VL have lower accuracy when deployed on CPU in a text generation pipeline with continuous batching. It is recommended to deploy these models in a stateful pipeline which processes the requests sequentially like in the demo
Using NPU for image generation endpoints is unsupported in this release.

You can use an OpenVINO Model Server public docker images based on Ubuntu via the following command:

docker pull openvino/model_server:2025.2.1- CPU device support with image based on Ubuntu24.04
docker pull openvino/model_server:2025.2.1-gpu - GPU, NPU and CPU device support with image based on Ubuntu 24.04
or use provided binary packages. Only packages with suffix _python_on have support for python.

Check the instructions how to install the binary package
The prebuilt image is also available on RedHat Ecosystem Catalog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenVINO Model Server 2025.2.1

Uh oh!