Skip to content

Conversation

ywang96
Copy link
Member

@ywang96 ywang96 commented Sep 12, 2025

Purpose

This PR adds model support for the upcoming Qwen3-VL models, including both dense and MoE variants.

Originally authored by @wulipc @JJJYmmm - much thanks to Qwen Team to upstream the model support!

Reference HF implementation - huggingface/transformers#40795

To run the model, the latest transformers version or 4.57.0+ is required

uv pip install git+https://github.com/huggingface/transformers.git

Co-authored by @Isotr0py

Follow-ups:

Test Plan

To be added after model release

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

ywang96 and others added 2 commits September 12, 2025 05:57
Signed-off-by: Roger Wang <[email protected]>
Co-authored-by: Huang Jie <[email protected]>
Co-authored-by: 松灵 <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
@mergify mergify bot added documentation Improvements or additions to documentation new-model Requests to new models qwen Related to Qwen models labels Sep 12, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Qwen3-VL model series. The changes include adding new model definitions, updating registries, and modifying rotary embedding logic to support multimodal inputs. My review focuses on correctness and maintainability. I've identified a few critical issues related to potential bugs in the implementation and code duplication that should be addressed. Specifically, there's a potential UnboundLocalError in an example file, a typo in a model name in the test registry, a likely bug in the interleaved RoPE implementation, and an in-place list modification that could cause side effects.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Roger Wang <[email protected]>

# For profile run
_MAX_FRAMES_PER_VIDEO = 16
_MAX_FRAMES_PER_VIDEO = 600
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should allow overriding this for each model architecture

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - let me confirm with @wulipc on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen-VL models impose no limit on the maximum number of video frames—only max_se_length constrains it. We’ve set a sufficiently large value, and you can also remove the max_frames_per_video restriction entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump regarding this

@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 12, 2025
ywang96 and others added 2 commits September 12, 2025 16:02
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processor test can pass locally now. I think the only thing left is checking example's availability. (I have to go to bed now. Will check it tomorrow)

Running 3 items in this shard: tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/]

tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:12 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 09-13 02:39:12 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:19 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:19 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:22 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:22 [__init__.py:1815] Using max model len 32768
PASSED

================================================================================== warnings summary ===================================================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================== 3 passed, 210 deselected, 1 warning in 19.46s ====================================================================

@ywang96
Copy link
Member Author

ywang96 commented Sep 13, 2025

The processor test can pass locally now. I think the only thing left is checking example's availability. (I have to go to bed now. Will check it tomorrow)

Running 3 items in this shard: tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/]

tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:12 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 09-13 02:39:12 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:19 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:19 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:22 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:22 [__init__.py:1815] Using max model len 32768
PASSED

================================================================================== warnings summary ===================================================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================== 3 passed, 210 deselected, 1 warning in 19.46s ====================================================================

@Isotr0py I just pushed a change and test the two examples - should be good now

Copy link

mergify bot commented Sep 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ywang96.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 16, 2025
@mergify mergify bot removed the needs-rebase label Sep 16, 2025
ywang96 and others added 5 commits September 16, 2025 05:32
@ywang96 ywang96 enabled auto-merge (squash) September 17, 2025 00:18
@ywang96 ywang96 disabled auto-merge September 17, 2025 02:29
@ywang96 ywang96 enabled auto-merge (squash) September 17, 2025 02:29
@ywang96 ywang96 merged commit 0f7acdd into vllm-project:main Sep 17, 2025
51 checks passed
frank-wei pushed a commit to frank-wei/vllm that referenced this pull request Sep 23, 2025
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Co-authored-by: Huang Jie <[email protected]>
Co-authored-by: 松灵 <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
langc23 pushed a commit to zte-riscv/vllm that referenced this pull request Sep 23, 2025
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Co-authored-by: Huang Jie <[email protected]>
Co-authored-by: 松灵 <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
wenbinc-Bin pushed a commit to wenbinc-Bin/vllm-fork that referenced this pull request Sep 24, 2025
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Co-authored-by: Huang Jie <[email protected]>
Co-authored-by: 松灵 <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) new-model Requests to new models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants