Release 2025.2.0.0 · openvinotoolkit/openvino.genai

What's Changed

[GHA] Replaced isual_language_chat_sample-ubuntu-minicpm_v2_6 job by @mryzhov in #1909
[GHA] Replaced cpp-chat_sample-ubuntu pipeline by @mryzhov in #1913
Add support of Prompt Lookup decoding to llm bench by @sbalandi in #1917
[GHA] Introduce SDL pipeline by @mryzhov in #1924
Switch Download OpenVINO step to aks-medium-runner by @ababushk in #1889
Bump product version 2025.2 by @akladiev in #1920
[GHA] Replaced cpp-continuous-batching by @mryzhov in #1910
Update dependencies in samples by @ilya-lavrenov in #1925
phi3_v: add universal tag by @Wovchena in #1921
Fix image_id unary error by @rkazants in #1927
[Docs] Image generation use case by @yatarkan in #1877
Add perf metrics for CB VLM by @pavel-esir in #1897
Enhance the flexibility of the c streamer by @apinge in #1941
add Gemma3 LLM to supported models by @eaidova in #1942
Added GPTQ/AWQ support with HF Transformers by @AlexKoff88 in #1933
Add --static_reshape option to llm_bench, to force static reshape + compilation at pipeline creation by @RyanMetcalfeInt8 in #1851
benchmark_image_gen: Add --reshape option, and ability to specify multiple devices by @RyanMetcalfeInt8 in #1878
Revert perf regression changes by @dkalinowski in #1949
Add running greedy_causal_lm for JS to the sample tests by @Retribution98 in #1930
[Docs] Add VLM use case by @yatarkan in #1907
Added possibility to generate base text on GPU for text evaluation. by @andreyanufr in #1945
VLM: change infer to start_async/wait by @dkalinowski in #1948
[WWB]: Addressed issues with validation on Windows by @AlexKoff88 in #1953
[GHA] Remove bandit pipeline by @mryzhov in #1956
Disable MSVC debug assertions, addressing false positives in iterator checking by @apinge in #1952
[GHA] Replaced genai-tools pipeline by @mryzhov in #1954
configurable delay by @eaidova in #1963
Update cast of tensor data pointer for const tensors by @praasz in #1966
Remove tokens after EOS for draft model for speculative decoding by @sbalandi in #1951
Add testcase for chat_sample_c by @apinge in #1934
Skip warm-up iteration during llm_bench results averaging by @nikita-savelyevv in #1972
Reset pipeline cache usage statistics on each generate call by @vshampor in #1961
[Docs] Update models, rebuild on push by @yatarkan in #1922
Updated logic whether PA backend is explicitly required by @ilya-lavrenov in #1976
[GHA] [MAC] Use latest_available_commit OV artifacts by @mryzhov in #1977
[GHA] Set HF_TOKEN by @mryzhov in #1986
[GHA] Setup ov_cache by @mryzhov in #1962
[GHA] Changed cleanup runner by @mryzhov in #1995
Added mutex to methods which use blocks map. by @popovaan in #1975
Add documentation and sample on KV cache eviction by @vshampor in #1960
StaticLLMPipeline: Simplify compile_model call logic by @smirnov-alexey in #1915
Fix reshape in heterogeneous SD samples by @helena-intel in #1994
Update tokenizers by @mryzhov in #2002
docs: fix max_new_tokens option description by @tpragasa in #1987
[Docs] Add speech recognition with whisper use case by @yatarkan in #1971
Revert "VLM: change infer to start_async/wait " by @ilya-lavrenov in #2004
Revert "Revert perf regression changes" by @ilya-lavrenov in #2003
Set xfail to failing tests. by @popovaan in #2006
[GHA] Use cpack bindings in the samples tests by @mryzhov in #1979
[Docs]: add Phi3.5MoE to supported models by @eaidova in #2012
add TensorArt SD3.5 models to supported list by @eaidova in #2013
Move MiniCPM resampler to vision encoder by @popovaan in #1997
[GHA] Fix ccache on Win/Mac by @mryzhov in #2008
samples/python/text_generation/lora.py -> samples/python/text_generation/lora_greedy_causal_lm.py by @Wovchena in #2007
Whisper timestamp fix by @RyanMetcalfeInt8 in #1918
Unskip Qwen2-VL-2B-Instruct sample test by @as-suvorov in #1970
[GHA] Use developer openvino packages by @mryzhov in #2000
Added NNCF to export-requirements.txt by @ilya-lavrenov in #1974
Bump py-build-cmake from 0.4.2 to 0.4.3 by @dependabot in #2016
Use OV_CACHE for python tests by @as-suvorov in #2020
[GHA] Disable HTTP calls to the Hugging Face Hub by @mryzhov in #2021
Add python bindings to VLMPipeline for encrypted models by @olpipi in #1916
Bump the npm_and_yarn group across 1 directory with 2 updates by @dependabot in #2017
CB: auto plugin support by @ilya-lavrenov in #2034
timeout-minutes: 90 by @Wovchena in #2039
Bump diffusers from 0.32.2 to 0.33.1 by @dependabot in #2031
Bump diffusers from 0.32.2 to 0.33.1 in /samples by @dependabot in #2032
Enable cache and add cache encryption to samples by @olpipi in #1990
Fix VLM concurrency by @mzegla in #2022
Move Phi3 vision projection model to vision encoder by @popovaan in #2009
Fix spelling by @Wovchena in #2025
[Docs] Enable autogenerated samples docs by @yatarkan in #2029
Synchronize entire embeddings calculation phase (#1967) by @mzegla in #1993
Add missing finish reason set when finishing the sequence by @mzegla in #2036
Bump image-size from 1.2.0 to 1.2.1 in /site in the npm_and_yarn group across 1 directory by @dependabot in #1998
Add README for C Samples by @apinge in #2040
Use ov_cache for test_vlm_pipeline by @as-suvorov in #2042
increase timeouts by @Wovchena in #2041
[GHA] Use azure runners for python tests by @mryzhov in #1991
[WWB]: move diffusers imports closer to usage by @eaidova in #2046
[llm bench] Move calculation of memory consumption to memory_monitor tool by @sbalandi in #1937
[llm bench] allow loading onnx models using optimum-intel by @eaidova in #2050
Add cache encryption to vlm sample by @olpipi in #2038
Remove note about GPU for phi3v by @eaidova in #2053
Update requirement according to memory_monitor needs by @sbalandi in #2064
[CI] Freeze optimum-intel by @mryzhov in #2061
Propose chat template fixes by @Wovchena in #2070
Add tiny-random-internvl2 to python tests by @yatarkan in #1978
Don't download on import by @Wovchena in #2054
Don't mention chat templates in start_chat docstrings by @Wovchena in #2055
Revert "Set xfail to failing tests. (#2006)" by @popovaan in #2066
Fix perf metrics update in prompt lookup decoding pipeline by @mzegla in #2044
Bump http-proxy-middleware from 2.0.7 to 2.0.9 in /site in the npm_and_yarn group across 1 directory by @dependabot in #2072
GHA: pin OpenVINO by @ilya-lavrenov in #2078
[JS] Add LLMPipeline samples by @Retribution98 in #2058
Disable contituous batching if cannot get context by @WeldonWangwang in #2060
add internvl3 to supported VLM by @eaidova in #2076
Add get_vocab Method to Tokenizer by @apaniukov in #2059
GHA: pin OpenVINO by @Wovchena in #2079
Raise exception if input prompt exceeds its configured max size on NPU by @AsyaPronina in #1996
Optimize get_inputs_embeds() for Qwen2VL. by @popovaan in #2037
Revert "GHA: pin OpenVINO" by @ilya-lavrenov in #2088
Initial GGUF support by @ilya-lavrenov in #2081
Revert "GHA: pin OpenVINO" by @ilya-lavrenov in #2087
Bump diffusers and relax test_image_model_genai by @Wovchena in #2084
add sentencepiece to requirements.txt by @isanghao in #2089
Revert "Add get_vocab Method to Tokenizer (#2059)" by @Wovchena in #2086
Revert optimum-intel freeze by @Wovchena in #2083
[C] Add ov::Property as arguments to the ov_genai_llm_pipeline_create function by @apinge in #2071
Use reordered images grid in create_position_ids method for Qwen2VL by @yatarkan in #2093
Allow new Pillow's license by @Wovchena in #2077
Disable /sdl for gguf-tools by @Wovchena in #2100
Fix VLM CB metrics. by @popovaan in #2073
GGUF: fixed GGUF tests by @ilya-lavrenov in #2090
Fixed whisper tests by @ilya-lavrenov in #2105
fix llm_bench and wwb parameters for new transformers by @eaidova in #2098
[llm bench] Avoid crash of memory monitor when framework/pipeline change by exception by @sbalandi in #2106
GGUF support Qwen2.5 with type of Q4_K Q6_K by @TianmengChen in #2095
fix whisper optimum run via llm_bench by @eaidova in #2108
[Docs] Add installation, guides & concepts pages by @yatarkan in #2075
GGUF WA for GPU by @sammysun0711 in #2110
llava: add universal tag by @Wovchena in #2091
promp look up : store encoder stats by @esmirno in #2104
support 4bit cache copy by @zhangYiIntel in #1980
Fix of filling of pixel_values tensor in llava_image_embed_make_with_bytes_slice() by @popovaan in #2111
Update type hints in genai: dict by @Wovchena in #2112
Print speculative decoding perf metrics in Debug mode by @sbalandi in #2065
Fix license filter by @Wovchena in #2116
Increase max_retries by @Wovchena in #2115
[VLM] Clear inputs embedder cache when chat is finished. by @popovaan in #2117
InternVL2, LLaVA-NeXT: add universal tag by @Wovchena in #2114
docs: fix path to kv-cache-areas-diagram.svg by @Wovchena in #2101
Update llm_bench requirements.txt to contain sentencepiece by @skuros in #2121
Bring Back get_vocab by @apaniukov in #2107
GGUF support load split files for Qwen2.5 by @TianmengChen in #2120
samples: Adds optional device selection to some samples by @apram0d in #2028
[GHA] Fixed dependabot trigger for github actions by @mryzhov in #2123
Bump actions/download-artifact from 4.1.8 to 4.3.0 by @dependabot in #2133
Add remove adapters for LLMpipeline by @wenyi5608 in #1852
Coverity: exclude C++ and Python Tokenizers by @Wovchena in #2124
Bump actions/setup-node from 4.0.2 to 4.4.0 by @dependabot in #2132
GGUF Q6K WA for GPU by @TianmengChen in #2135
[llm bench]: fix hook for beam search for optimum by @eaidova in #2128
Update type hints in genai by @Wovchena in #2134
Copy tags to docs by @Wovchena in #2127
Bump actions/checkout from 4.1.6 to 4.2.2 by @dependabot in #2136
Bump actions/setup-python from 5.4.0 to 5.6.0 by @dependabot in #2131
Bump aquasecurity/trivy-action from 0.29.0 to 0.30.0 by @dependabot in #2137
Removed KVCacheConfig from internal API by @ilya-lavrenov in #2138
Alias CLIPTextModelWithProjection as CLIPTextModel by @ilya-lavrenov in #1809
[GGUF] Optimize Load GGUF with Threading by @sammysun0711 in #2139
Bump actions/upload-artifact from 4.4.3 to 4.6.2 by @dependabot in #2143
Continuous batching minor improvements by @Wovchena in #2144
Remove extra call by @Wovchena in #2148
Group source files for smart CI by @ilya-lavrenov in #2146
Log failed output by @Wovchena in #2152
Add a sample of LLM ReAct Agent by @JamieVC in #1926
Tokenizer: patch simplified_chat_template by @Wovchena in #2145
[llm_bench] Fix batch size processing while image gen benchmark by @apram0d in #2125
[VLM] Add Qwen2.5-VL model support by @yatarkan in #2140
Use full float for hash by @Wovchena in #2149
Replace non existing models extend chat template mapping by @Wovchena in #2153
[RAG] Add text embedding pipeline by @as-suvorov in #2057
Bump json5 from 0.10.0 to 0.12.0 in /samples by @dependabot in #2154
[C] Implement type conversion for the property values of MAX_PROMPT_LEN and MIN_RESPONSE_LEN by @apinge in #2142
Added smart CI for Linux workflow by @ilya-lavrenov in #2158
[GHA] Unique ov_cache by @mryzhov in #2160
Zero out other half of int64 for hash by @Wovchena in #2157
Added smart CI for Windows and macOS by @ilya-lavrenov in #2164
Enables PA for arm64 by @ilya-lavrenov in #2165
Fix whisper pipeline beam search decoding by @as-suvorov in #2166
add phi4 reasoning to supported by @eaidova in #2161
CVS-167152: fixed CLIPTextModelWithProjection creation in Python by @ilya-lavrenov in #2169
Adjusted smart CI / labeler configs by @ilya-lavrenov in #2168
Add Text Embedding pipeline samples by @as-suvorov in #2167
[GHA] Fix component pattern by @akladiev in #2181
Add backoff to requirements_conversion.txt by @skuros in #2170
Bump actions/dependency-review-action from 4.6.0 to 4.7.0 by @dependabot in #2185
Regenerate Windows cache by @Wovchena in #2188
Fix race cond. Move get_awaiting_requests method to base class by @olpipi in #2174
Fix overflow. Fix coverity. by @olpipi in #2179
Add SD3 LoRA Adapter Support by @sammysun0711 in #2187
[llm_bench] fix vlm processing without image and add more supported models by @eaidova in #2182
[Coverity] Fix null pointer dereferences by @popovaan in #2184
[llm_bench] fix overwriting bos token by @michal-miotk in #2199
Fix Whisper tests by @as-suvorov in #2203
[JS] Upgrade the js package versions to the upcoming releases by @Retribution98 in #2045
Revert "Regenerate Windows cache (#2188)" by @Wovchena in #2196
Added info to Scheduler docstring, optimized calculation of hash during prefix caching. by @popovaan in #2189
Explain MODE_STATIC vs MODE_FUSE by @Wovchena in #2198
Bump actions/dependency-review-action from 4.7.0 to 4.7.1 by @dependabot in #2207
Add paired input into genai::Tokenizer by @pavel-esir in #2080
Whisper static pipeline: fix for fp8 models by @eshiryae in #2201
LoRA scaling fix by @likholat in #2210
Benchmark add empty lora test by @wenyi5608 in #2183
Bump pybind11-stubgen from 2.5.3 to 2.5.4 by @dependabot in #2208
LLM: release plugin once pipeline is removed and WA for GPU by @sbalandi in #2102
Bump onnx from 1.17.0 to 1.18.0 in /tests/python_tests by @dependabot in #2202
[llm_bench] first_token_time should not be scaled by batch_size by @pavel-esir in #2217
[Coverity] Removed dead code in preprocess_clip_image_llava(). by @popovaan in #2230
Fix Coverity issues by @olpipi in #2222
Fix CI problems: update optimum-intel, use higher memory runner, disable whisper tests by @rkazants in #2229
[TTS] Introduce Text-to-speech pipeline API and support SpeechT5 TTS by @rkazants in #2209
Remove commented code by @Wovchena in #2220
Bump undici from 6.21.1 to 6.21.3 in /site in the npm_and_yarn group across 1 directory by @dependabot in #2219
[llm_bench] Include #egg=optimum-intel to avoid issues when freezing … by @wkobielx in #2237
Dont include debug_utils.hpp by @Wovchena in #2223
Switch VLM to ContinuousBatching by default. by @popovaan in #2129
add qwen3 chat template to mapping by @eaidova in #2228
[JS] Add an interrupt option for LLMPipeline by @Retribution98 in #2235
Increased VLM tests timeout. by @popovaan in #2238
Fix Coverity issues by @olpipi in #2232
[StatefulLLMPipeline] Remove GenAI slicing in stateful pipeline for NPU by @smirnov-alexey in #2246
Add simplified chat template for falcon-7b-instruct by @eaidova in #2252
[NPUW] Re-fixed issue with long prompt for NPU by @AsyaPronina in #2242
Enable chat template by default during WWB evaluation of text models by @nikita-savelyevv in #2051
Remove LoRA scaling fix by @likholat in #2277
Add Phi-4-multimodal-instruct by @Wovchena in #2221
Image generation multiconcurrency by @dkalinowski in #2190
Implement SnapKV (#2067) - release branch PR by @vshampor in #2278
[GGUF] Support GGUF format for tokenizers and detokenizers by @rkazants in #2272
Switch to SDPA for VLMs by @yatarkan in #2296
add new chat template for qwen3 release by @eaidova in #2298
Revert switch to CB changes. by @popovaan in #2304
Fix Phi3-vision prompt by @yatarkan in #2306
Phi4-mm: fix prompt processing, patch position ids and separator inserter by @yatarkan in #2293

New Contributors

@ababushk made their first contribution in #1889
@tpragasa made their first contribution in #1987
@WeldonWangwang made their first contribution in #2060
@apram0d made their first contribution in #2028
@JamieVC made their first contribution in #1926
@michal-miotk made their first contribution in #2199

Full Changelog: 2025.1.0.0...2025.2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2025.2.0.0

What's Changed

New Contributors

Contributors

Uh oh!