-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[Model] Support Qwen3-VL Model Series #24727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Roger Wang <[email protected]>
Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Signed-off-by: Roger Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the Qwen3-VL model series. The changes include adding new model definitions, updating registries, and modifying rotary embedding logic to support multimodal inputs. My review focuses on correctness and maintainability. I've identified a few critical issues related to potential bugs in the implementation and code duplication that should be addressed. Specifically, there's a potential UnboundLocalError
in an example file, a typo in a model name in the test registry, a likely bug in the interleaved RoPE implementation, and an in-place list modification that could cause side effects.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Roger Wang <[email protected]>
|
||
# For profile run | ||
_MAX_FRAMES_PER_VIDEO = 16 | ||
_MAX_FRAMES_PER_VIDEO = 600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should allow overriding this for each model architecture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - let me confirm with @wulipc on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qwen-VL models impose no limit on the maximum number of video frames—only max_se_length constrains it. We’ve set a sufficiently large value, and you can also remove the max_frames_per_video restriction entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump regarding this
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The processor test can pass locally now. I think the only thing left is checking example's availability. (I have to go to bed now. Will check it tomorrow)
Running 3 items in this shard: tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/], tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/]
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.3-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:12 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 09-13 02:39:12 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-0.5-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:19 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:19 [__init__.py:1815] Using max model len 32768
PASSED
tests/models/multimodal/processing/test_common.py::test_processing_correctness[1.0-32-1.0-/home/mozf/LLM/Qwen3-VL-4B-Instruct/] INFO 09-13 02:39:22 [__init__.py:742] Resolved architecture: Qwen3VLForConditionalGeneration
INFO 09-13 02:39:22 [__init__.py:1815] Using max model len 32768
PASSED
================================================================================== warnings summary ===================================================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
/home/mozf/develop-projects/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
ref_error: type[Exception] = jsonschema.RefResolutionError,
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================== 3 passed, 210 deselected, 1 warning in 19.46s ====================================================================
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
@Isotr0py I just pushed a change and test the two examples - should be good now |
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Purpose
This PR adds model support for the upcoming Qwen3-VL models, including both dense and MoE variants.
Originally authored by @wulipc @JJJYmmm - much thanks to Qwen Team to upstream the model support!
Reference HF implementation - huggingface/transformers#40795
To run the model, the latest transformers version or 4.57.0+ is required
Co-authored by @Isotr0py
Follow-ups:
fast_pos_embed_interpolate
#25337 [Perf] Further optimization for Qwen3-VLfast_pos_embed_interpolate
#25347Test Plan
To be added after model release
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.