[MM Encoder] Apply DP ViT for Qwen3-VL model series #24955

ywang96 · 2025-09-16T09:07:58Z

Purpose

Add DP ViT for Qwen3-VL models - this PR should be merged only after #24727 is merged.

Test Plan

Running on Qwen3-VL-30B-A3B-Instruct with 4xL40s (PCI-E) with the following changes to the vision_language.py

    sampling_params = SamplingParams(
-        temperature=0.2, max_tokens=64, stop_token_ids=req_data.stop_token_ids
+        temperature=0.0, max_tokens=1, stop_token_ids=req_data.stop_token_ids # measure prefill perf
    )

    engine_args = EngineArgs(
        model=model_name,
-       max_model_len=4096,
-       max_num_seqs=5,
        mm_processor_kwargs={
            "min_pixels": 28 * 28,
            "max_pixels": 1280 * 28 * 28,
            "fps": 1,
        },
        limit_mm_per_prompt={modality: 1},
+       tensor_parallel_size=4,
+       mm_encoder_tp_mode="data", # vs "weights"
+       enable_prefix_caching=False,
+       mm_processor_cache_gb=0,
    )

Test Result

Running 500 image prompts with mm_encoder_tp_mode="weights"

python3 examples/offline_inference/vision_language.py -m qwen3_vl_moe --modality image --num-prompts 500 --seed 0

Processed prompts: 100%|█████████████████████████████████████████████████████████| 1000/1000 [02:30<00:00,  6.63it/s, est. speed input: 6491.45 toks/s, output: 6.63 toks/s]

Running 500 images prompts with mm_encoder_tp_mode="data"

python3 examples/offline_inference/vision_language.py -m qwen3_vl_moe --modality image --num-prompts 500 --seed 0

Processed prompts: 100%|██████████████████████████████████████| 1000/1000 [01:12<00:00, 13.82it/s, est. speed input: 13533.77 toks/s, output: 13.82 toks/s]

Results are bit biased since we would not typically expect one to run high TP on chips without NVLink - but this is what I have available.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <[email protected]>

Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Signed-off-by: Roger Wang <[email protected]>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2025-09-17T07:40:55Z

@tjtanaa Could you take a look too? Thanks

Isotr0py

LGTM!

DarkLight1337 · 2025-09-17T07:58:21Z

Have you run lm-eval to ensure the correctness?

ywang96 · 2025-09-17T08:12:07Z

Have you run lm-eval to ensure the correctness?

@DarkLight1337 Yea that's what I'm planning to do next - though we don't have official results but should be fine as long as the two results match

tjtanaa · 2025-09-17T08:35:35Z

@ywang96 LTGM as well. Can't wait for the model release.

ywang96 · 2025-09-17T10:24:27Z

I won't merge this PR until I verify the correctness tomorrow.

ywang96 · 2025-09-18T03:38:21Z

Additional results on 4xH200

vllm bench serve  \
--endpoint-type openai-chat \
--model Qwen-SGlang/Qwen3-VL-30B-A3B-Instruct   \
--tokenizer /tmp-nvme/models/Qwen-SGlang/Qwen3-VL-30B-A3B-Instruct \
--endpoint /v1/chat/completions   \
--dataset-name hf   \
--dataset-path lmarena-ai/VisionArena-Chat   \
--hf-split train   \
--num-prompts 1000 --request-rate 3

TP

============ Serving Benchmark Result ============
Successful requests:                     1000      
Request rate configured (RPS):           3.00      
Benchmark duration (s):                  334.79    
Total input tokens:                      94327     
Total generated tokens:                  121241    
Request throughput (req/s):              2.99      
Output token throughput (tok/s):         362.14    
Total Token throughput (tok/s):          643.88    
---------------Time to First Token----------------
Mean TTFT (ms):                          284.71    
Median TTFT (ms):                        203.83    
P99 TTFT (ms):                           2600.75   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          19.60     
Median TPOT (ms):                        18.02     
P99 TPOT (ms):                           47.18     
---------------Inter-token Latency----------------
Mean ITL (ms):                           19.90     
Median ITL (ms):                         13.24     
P99 ITL (ms):                            153.90    
==================================================

DP

============ Serving Benchmark Result ============
Successful requests:                     1000      
Request rate configured (RPS):           3.00      
Benchmark duration (s):                  334.76    
Total input tokens:                      94327     
Total generated tokens:                  121609    
Request throughput (req/s):              2.99      
Output token throughput (tok/s):         363.28    
Total Token throughput (tok/s):          645.05    
---------------Time to First Token----------------
Mean TTFT (ms):                          191.12    
Median TTFT (ms):                        166.74    
P99 TTFT (ms):                           1195.92   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          15.98     
Median TPOT (ms):                        15.07     
P99 TPOT (ms):                           39.88     
---------------Inter-token Latency----------------
Mean ITL (ms):                           16.23     
Median ITL (ms):                         11.37     
P99 ITL (ms):                            119.14    
==================================================

…litPR into model_register * 'model_register' of https://github.com/dsxsteven/vllm_splitPR: (138 commits) Retrieve `sliding_window` from text config in Gemma3 MM (vllm-project#25085) [Docs] Fix API Reference (vllm-project#25140) [Kernel] Better inf handling for grouped topk cu (vllm-project#24886) [CLI] Use streaming in CLI chat and completion commands (vllm-project#23769) [benchmark] add peak throughput metrics and plot (vllm-project#23867) [Spec Decode] Efficient padded speculation (vllm-project#24539) [V0 Deprecation] Remove more V0 tests (vllm-project#25117) [EPLB] Add EPLB support for hunyuan_v1 (vllm-project#23078) [XPU] Whisper model support on XPU Platform (vllm-project#25123) Mark prompt logprobs as incompatible with prompt embeds at API level (vllm-project#25077) [Model] enable data parallel for InternVL vision encoder (vllm-project#23909) [Kernels] Overlap shared experts with combine instead of dispatch (vllm-project#24254) [Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and mlp in qwen2moe to successfully load ignored params in quantized models (vllm-project#24960) [Core][MM] Cleanup `MultiModalCache` (vllm-project#25006) [Docs] Clean up the contributing README (vllm-project#25099) [MM Encoder] Apply DP ViT for Qwen3-VL model series (vllm-project#24955) [Kernels] Enable DeepGEMM by default (vllm-project#24462) [V0 Deprecation] Skip PP test (vllm-project#25128) [V0 Deprecation] Remove misc V0 tests (vllm-project#25118) [V0 Deprecation] Remove V0 Tracing & Metrics tests (vllm-project#25115) ...

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Isotr0py <[email protected]>

Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: charlifu <[email protected]>

ywang96 and others added 29 commits September 12, 2025 05:57

upstream

f352b0d

Signed-off-by: Roger Wang <[email protected]>

fix & add co-author

667e973

Co-authored-by: Huang Jie <[email protected]> Co-authored-by: 松灵 <[email protected]> Signed-off-by: Roger Wang <[email protected]>

Update tests/models/registry.py

8fe70a7

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Roger Wang <[email protected]>

add missing processor test

5f6afa1

Signed-off-by: Isotr0py <[email protected]>

revert str

9c5808f

Signed-off-by: Roger Wang <[email protected]>

fix processor test hashing

c7fe668

Signed-off-by: Isotr0py <[email protected]>

fix frames indices

aa2330f

Signed-off-by: Isotr0py <[email protected]>

fix hit_rate 1.0

e1f8397

Signed-off-by: Isotr0py <[email protected]>

typo

9c7939c

Signed-off-by: Isotr0py <[email protected]>

fix placeholder replacement

5d4f6dd

Signed-off-by: Isotr0py <[email protected]>

fix video example

0d88363

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into upstream-qwen-3-vl

574884c

fix vit backend

6ec3968

Signed-off-by: Roger Wang <[email protected]>

fix online serving metadata

0f80a19

Signed-off-by: Isotr0py <[email protected]>

avoid hardcode fps=1

ba54870

Signed-off-by: Isotr0py <[email protected]>

oops fps=1

f7c37a9

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into upstream-qwen-3-vl

a6c5d7e

Merge branch 'main' into upstream-qwen-3-vl

cbf6dee

Signed-off-by: Isotr0py <[email protected]>

catch up and fix processor test

bec9e7e

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into upstream-qwen-3-vl

5f3cf0e

fix model path

5027f31

Signed-off-by: Roger Wang <[email protected]>

fix qwen_vl_utils compatibility

d0133e2

Signed-off-by: Isotr0py <[email protected]>

fix fps

07e4f52

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into upstream-qwen-3-vl

78afc5b

Signed-off-by: Roger Wang <[email protected]>

fix

44f89b2

Signed-off-by: Roger Wang <[email protected]>

cleanup

c7ea6f7

Signed-off-by: Roger Wang <[email protected]>

do not modify metadata

10bd983

Signed-off-by: Roger Wang <[email protected]>

fix example and online fps

a4c0d34

Signed-off-by: Isotr0py <[email protected]>

dp vit

b1a33a5

Signed-off-by: Roger Wang <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Sep 16, 2025

mergify bot added the qwen Related to Qwen models label Sep 16, 2025

ywang96 added 6 commits September 16, 2025 09:13

revert

98f7019

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into qwen3-vl-dp-vit

f15e09d

Signed-off-by: Roger Wang <[email protected]>

cleanup

d888781

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into qwen3-vl-dp-vit

3d3fbe5

clarify

9fd7448

Signed-off-by: Roger Wang <[email protected]>

Merge branch 'main' into qwen3-vl-dp-vit

bf353c9

ywang96 marked this pull request as ready for review September 17, 2025 06:09

ywang96 requested a review from sighingnow as a code owner September 17, 2025 06:09

Merge branch 'main' into qwen3-vl-dp-vit

0fa1dcc

ywang96 requested a review from Isotr0py September 17, 2025 07:40

Isotr0py approved these changes Sep 17, 2025

View reviewed changes

Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 17, 2025

Merge branch 'main' into qwen3-vl-dp-vit

7b2cdd9

DarkLight1337 approved these changes Sep 18, 2025

View reviewed changes

vllm-bot merged commit 3127274 into vllm-project:main Sep 18, 2025
45 of 48 checks passed

ywang96 mentioned this pull request Sep 20, 2025

[Model] Support Qwen3-VL Model Series #24727

Merged

12 tasks

DarkLight1337 mentioned this pull request Sep 26, 2025

[Feature]: Generalized the DP feature for ViT and multimodal backbone for the benefit of all models #22743

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MM Encoder] Apply DP ViT for Qwen3-VL model series #24955

[MM Encoder] Apply DP ViT for Qwen3-VL model series #24955

Uh oh!

ywang96 commented Sep 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

ywang96 commented Sep 17, 2025

Uh oh!

Isotr0py left a comment

Uh oh!

DarkLight1337 commented Sep 17, 2025

Uh oh!

ywang96 commented Sep 17, 2025

Uh oh!

tjtanaa commented Sep 17, 2025

Uh oh!

ywang96 commented Sep 17, 2025

Uh oh!

ywang96 commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MM Encoder] Apply DP ViT for Qwen3-VL model series #24955

[MM Encoder] Apply DP ViT for Qwen3-VL model series #24955

Uh oh!

Conversation

ywang96 commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

ywang96 commented Sep 17, 2025

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Sep 17, 2025

Uh oh!

ywang96 commented Sep 17, 2025

Uh oh!

tjtanaa commented Sep 17, 2025

Uh oh!

ywang96 commented Sep 17, 2025

Uh oh!

ywang96 commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

ywang96 commented Sep 16, 2025 •

edited by github-actions bot

Loading