OpenVINO Model Server 2025.2.1
The 2025.2.1 is a minor release with bug fixes and improvements, mainly in automatic model pulling and image generation.
Improvements:
- Enable passing
chat_template_kwargs
parameters inchat/completion
request. It can be used to turn off model reasoning. - Allow setting CORS headers in HTTP response. It can resolve connectivity problems between OpenWebUI and the model server.
Other changes:
- Changed NPU driver version from 1.17 to 1.19 in docker images
- Security related updates in dependencies
Bug fixes:
- Removed limitation for Image generation - now it supports requesting several output images with parameter
n
add_to_config
andremove_from_config
parameters accept path to configuration file in addition to directory containingconfig.json
file- Resolved connectivity issues while pulling models from HuggingFace Hub without proxy configuration
- Fixed handling HF_ENDPOINT environment variable with HTTP addresses as previously
https://
prefix was incorrectly added. - Changed
pull
feature environment variablesGIT_SERVER_CONNECT_TIMEOUT_MS
toGIT_OPT_SET_SERVER_TIMEOUT
andGIT_SERVER_TIMEOUT_MS
toGIT_OPT_SET_SERVER_TIMEOUT
to unify with underlying libgit2 implementation. - Fixed handling relative paths on Windows with MediaPipes/LLMs for
config_path
parameter. - Fixed agentic demo not working without proxy
- Stop rejecting
response_format
field in image generation. While parameter accepts now only base64_json value, it allows to integrate with Open WebUI - Add missing
--response_parser
parameter when using OVMS to pull LLM's model and prepare its configuration - Block simultaneous use of
--list_models
and--pull
parameters as they are exclusive. - Fixed accuracy for the Phi4-mini model response parser while using functions with lists as arguments
- export_model.py script fix for handling target_device for embeddings and reranking models
- stateful text generation pipeline do not include usage content - it is not supported for such pipeline type. Before it was returning incorrect response.
Known issues and limitations
- VLM models QwenVL2, QwenVL2.5, and Phi3_VL have lower accuracy when deployed on CPU in a text generation pipeline with continuous batching. It is recommended to deploy these models in a stateful pipeline which processes the requests sequentially like in the demo
- Using NPU for image generation endpoints is unsupported in this release.
You can use an OpenVINO Model Server public docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2025.2.1- CPU device support with image based on Ubuntu24.04
docker pull openvino/model_server:2025.2.1-gpu - GPU, NPU and CPU device support with image based on Ubuntu 24.04
or use provided binary packages. Only packages with suffix _python_on
have support for python.
Check the instructions how to install the binary package
The prebuilt image is also available on RedHat Ecosystem Catalog