[Model] Support Qwen3-VL Model Series #24727

ywang96 · 2025-09-12T06:13:36Z

Purpose

This PR adds model support for the upcoming Qwen3-VL models, including both dense and MoE variants.

Originally authored by @wulipc @JJJYmmm - much thanks to Qwen Team to upstream the model support!

Reference HF implementation - huggingface/transformers#40795

To run the model, the latest transformers version or 4.57.0+ is required

uv pip install git+https://github.com/huggingface/transformers.git

Co-authored by @Isotr0py

Follow-ups:

DP ViT [MM Encoder] Apply DP ViT for Qwen3-VL model series #24955
Support Triton interleaved MRoPE [Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE #25055
EP [Bugfix] Fix Qwen3-VL-MoE weight loading for EP #25300
batch pos_embed interpolate [MM][Perf] Minor Optimization on Qwen3-VL fast_pos_embed_interpolate #25337 [Perf] Further optimization for Qwen3-VL fast_pos_embed_interpolate #25347
YaRN with MRope [feat] Support MRoPE + YaRN #25384

Test Plan

To be added after model release

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <[email protected]>

Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Signed-off-by: Roger Wang <[email protected]>

gemini-code-assist

Code Review

This pull request adds support for the Qwen3-VL model series. The changes include adding new model definitions, updating registries, and modifying rotary embedding logic to support multimodal inputs. My review focuses on correctness and maintainability. I've identified a few critical issues related to potential bugs in the implementation and code duplication that should be addressed. Specifically, there's a potential UnboundLocalError in an example file, a typo in a model name in the test registry, a likely bug in the interleaved RoPE implementation, and an in-place list modification that could cause side effects.

examples/offline_inference/vision_language.py

vllm/model_executor/layers/rotary_embedding/mrope.py

tests/models/registry.py

vllm/model_executor/models/qwen3_vl.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Roger Wang <[email protected]>

DarkLight1337 · 2025-09-12T13:50:13Z

vllm/model_executor/models/qwen2_vl.py


 # For profile run
-_MAX_FRAMES_PER_VIDEO = 16
+_MAX_FRAMES_PER_VIDEO = 600


Maybe we should allow overriding this for each model architecture

Good point - let me confirm with @wulipc on this

Qwen-VL models impose no limit on the maximum number of video frames—only max_se_length constrains it. We’ve set a sufficiently large value, and you can also remove the max_frames_per_video restriction entirely.

Bump regarding this

vllm/transformers_utils/processor.py

Signed-off-by: Isotr0py <[email protected]>

tests/models/multimodal/processing/test_common.py

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

tests/models/multimodal/processing/test_common.py

Signed-off-by: Isotr0py <[email protected]>

Isotr0py

The processor test can pass locally now. I think the only thing left is checking example's availability. (I have to go to bed now. Will check it tomorrow)

Running 3 items in this shard: tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/]

tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:12 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 09-13 02:39:12 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:19 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:19 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:22 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:22 [__init__.py:1815] Using max model len 32768
PASSED

================================================================================== warnings summary ===================================================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================== 3 passed, 210 deselected, 1 warning in 19.46s ====================================================================

examples/offline_inference/vision_language.py

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2025-09-13T07:47:25Z

The processor test can pass locally now. I think the only thing left is checking example's availability. (I have to go to bed now. Will check it tomorrow)

Running 3 items in this shard: tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/]

tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:12 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 09-13 02:39:12 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:19 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:19 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:22 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:22 [__init__.py:1815] Using max model len 32768
PASSED

================================================================================== warnings summary ===================================================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================== 3 passed, 210 deselected, 1 warning in 19.46s ====================================================================

@Isotr0py I just pushed a change and test the two examples - should be good now

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

mergify · 2025-09-16T04:20:22Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ywang96.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Co-authored-by: Isotr0py <[email protected]>

ywang96 and others added 2 commits September 12, 2025 05:57

upstream

f352b0d

Signed-off-by: Roger Wang <[email protected]>

fix & add co-author

667e973

Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Signed-off-by: Roger Wang <[email protected]>

ywang96 requested review from sighingnow, hmellor and DarkLight1337 as code owners September 12, 2025 06:13

mergify bot added documentation Improvements or additions to documentation new-model Requests to new models qwen Related to Qwen models labels Sep 12, 2025

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

examples/offline_inference/vision_language.py Show resolved Hide resolved

vllm/model_executor/layers/rotary_embedding/mrope.py Show resolved Hide resolved

tests/models/registry.py Outdated Show resolved Hide resolved

vllm/model_executor/models/qwen3_vl.py Outdated Show resolved Hide resolved

Update tests/models/registry.py

8fe70a7

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Roger Wang <[email protected]>

DarkLight1337 reviewed Sep 12, 2025

View reviewed changes

vllm/transformers_utils/processor.py Outdated Show resolved Hide resolved

add missing processor test

5f6afa1

Signed-off-by: Isotr0py <[email protected]>

mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 12, 2025

Isotr0py reviewed Sep 12, 2025

View reviewed changes

tests/models/multimodal/processing/test_common.py Show resolved Hide resolved

Isotr0py mentioned this pull request Sep 12, 2025

Update to Transformers v4.56.1 #24638

Open

ywang96 and others added 2 commits September 12, 2025 16:02

revert str

9c5808f

Signed-off-by: Roger Wang <[email protected]>

fix processor test hashing

c7fe668

Signed-off-by: Isotr0py <[email protected]>

Isotr0py reviewed Sep 12, 2025

View reviewed changes

tests/models/multimodal/processing/test_common.py Outdated Show resolved Hide resolved

Isotr0py added 4 commits September 13, 2025 01:07

fix frames indices

aa2330f

Signed-off-by: Isotr0py <[email protected]>

fix hit_rate 1.0

e1f8397

Signed-off-by: Isotr0py <[email protected]>

typo

9c7939c

Signed-off-by: Isotr0py <[email protected]>

fix placeholder replacement

5d4f6dd

Signed-off-by: Isotr0py <[email protected]>

Isotr0py reviewed Sep 12, 2025

View reviewed changes

examples/offline_inference/vision_language.py Show resolved Hide resolved

Isotr0py and others added 3 commits September 13, 2025 15:19

fix video example

0d88363

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into upstream-qwen-3-vl

574884c

fix vit backend

6ec3968

Signed-off-by: Roger Wang <[email protected]>

fix online serving metadata

0f80a19

Signed-off-by: Isotr0py <[email protected]>

Isotr0py requested a review from NickLucche as a code owner September 13, 2025 09:09

Isotr0py mentioned this pull request Sep 14, 2025

[Bugfix] Fix GLM4.1V multimodal processor with compatability for Transformers v4.56 #24822

Merged

5 tasks

ywang96 and others added 6 commits September 14, 2025 23:53

Merge branch 'main' into upstream-qwen-3-vl

a6c5d7e

Merge branch 'main' into upstream-qwen-3-vl

cbf6dee

Signed-off-by: Isotr0py <[email protected]>

catch up and fix processor test

bec9e7e

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into upstream-qwen-3-vl

5f3cf0e

fix model path

5027f31

Signed-off-by: Roger Wang <[email protected]>

fix qwen_vl_utils compatibility

d0133e2

Signed-off-by: Isotr0py <[email protected]>

mergify bot added the needs-rebase label Sep 16, 2025

ywang96 added 2 commits September 16, 2025 05:17

fix fps

07e4f52

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into upstream-qwen-3-vl

78afc5b

Signed-off-by: Roger Wang <[email protected]>

mergify bot removed the needs-rebase label Sep 16, 2025

ywang96 and others added 5 commits September 16, 2025 05:32

fix

44f89b2

Signed-off-by: Roger Wang <[email protected]>

cleanup

c7ea6f7

Signed-off-by: Roger Wang <[email protected]>

do not modify metadata

10bd983

Signed-off-by: Roger Wang <[email protected]>

fix example and online fps

a4c0d34

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into upstream-qwen-3-vl

e11895e

ywang96 mentioned this pull request Sep 16, 2025

[MM Encoder] Apply DP ViT for Qwen3-VL model series #24955

Merged

5 tasks

ywang96 added 2 commits September 16, 2025 23:55

clarify

438f146

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into upstream-qwen-3-vl

52e2b2b

ywang96 enabled auto-merge (squash) September 17, 2025 00:18

Merge branch 'main' into upstream-qwen-3-vl

887a833

ywang96 disabled auto-merge September 17, 2025 02:29

ywang96 enabled auto-merge (squash) September 17, 2025 02:29

ywang96 merged commit 0f7acdd into vllm-project:main Sep 17, 2025
51 checks passed

This was referenced Sep 17, 2025

[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE #25055

Merged

[VLM] Add Qwen3-VL generation test #25185

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Support Qwen3-VL Model Series #24727

[Model] Support Qwen3-VL Model Series #24727

ywang96 commented Sep 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Sep 12, 2025

Uh oh!

ywang96 Sep 12, 2025

Uh oh!

wulipc Sep 14, 2025

Uh oh!

DarkLight1337 Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Isotr0py left a comment •

edited

Loading

Uh oh!

Uh oh!

ywang96 commented Sep 13, 2025

Uh oh!

mergify bot commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Model] Support Qwen3-VL Model Series #24727

[Model] Support Qwen3-VL Model Series #24727

Conversation

ywang96 commented Sep 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

wulipc Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Isotr0py left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ywang96 commented Sep 13, 2025

Uh oh!

mergify bot commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

ywang96 commented Sep 12, 2025 •

edited by github-actions bot

Loading

Isotr0py left a comment •

edited

Loading