Skip to content

Conversation

jiahanc
Copy link
Contributor

@jiahanc jiahanc commented Sep 22, 2025

Purpose

Inherit from #23596

  1. Current gpt-oss eagle3 does not work on https://huggingface.co/nvidia/gpt-oss-120b-Eagle3, mainly config issues like vocab_size and lm_head
  2. The current implementation in spec-dec suppose only 1 attention groups :https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_model_runner.py#L1178-L1180.
    And https://github.com/vllm-project/vllm/blob/main/vllm/v1/spec_decode/eagle.py#L194-L199 only use the 1st attention builder.
    This causes accuracy issues because on models that have multiple attentions like gpt-oss, it will select the wrong attention group so draft model will use wrong attention as well as overwrite the KV cache of the target models

Test Plan

lm_eval --model local-completions --tasks gsm8k --model_args model=openai/gpt-oss-120b,base_url=http://0.0.0.0:30000/v1/completions,max_retries=3,tokenized_requests=False,timeout=1200,max_gen_toks=2048,max_length=8192 --batch_size 2048 --trust_remote_code --limit 0.8

Test Result

Before fix
Low AR:

(APIServer pid=21745) INFO 09-22 10:49:59 [metrics.py:96] SpecDecoding metrics: Mean acceptance length: 1.04, Accepted throughput: 77.80 tokens/s, Drafted throughput: 6284.37 tokens/s, Accepted: 778 tokens, Drafted: 62847 tokens, Per-position acceptance rate: 0.035, 0.002, 0.000, Avg Draft acceptance rate: 1.2%

Wrong Accuracy

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.0038|±  |0.0019|
|     |       |strict-match    |     5|exact_match|↑  |0.0000|±  |0.0000|

After fix
Correct AR

(APIServer pid=22270) INFO 09-22 10:55:25 [metrics.py:96] SpecDecoding metrics: Mean acceptance length: 2.81, Accepted throughput: 4559.99 tokens/s, Drafted throughput: 7571.32 tokens/s, Accepted: 45604 tokens, Drafted: 75720 tokens, Per-position acceptance rate: 0.755, 0.578, 0.474, Avg Draft acceptance rate: 60.2%

Correct Accuracy

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8362|±  |0.0114|
|     |       |strict-match    |     5|exact_match|↑  |0.5909|±  |0.0151|

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@jiahanc jiahanc marked this pull request as ready for review September 22, 2025 16:55
@mergify mergify bot added llama Related to Llama models gpt-oss Related to GPT-OSS models speculative-decoding v1 labels Sep 22, 2025
@benchislett benchislett added the bug Something isn't working label Sep 22, 2025
@jiahanc jiahanc marked this pull request as draft September 22, 2025 17:18
@jiahanc jiahanc marked this pull request as ready for review September 22, 2025 17:24
Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 22, 2025
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 22, 2025
@github-project-automation github-project-automation bot moved this from Ready to In progress in gpt-oss Issues & Enhancements Sep 22, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@github-project-automation github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Sep 22, 2025
@mgoin mgoin enabled auto-merge (squash) September 22, 2025 21:07
auto-merge was automatically disabled September 22, 2025 23:08

Head branch was pushed to by a user without write access

@mgoin
Copy link
Member

mgoin commented Sep 23, 2025

@jiahanc The failures in CI seem related

Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
@jiahanc jiahanc force-pushed the jiahanc/gpt-oss-eagle3-fix branch from 19d7184 to 677ebdd Compare September 23, 2025 16:26
Signed-off-by: jiahanc <[email protected]>
@jiahanc
Copy link
Contributor Author

jiahanc commented Sep 23, 2025

@jiahanc The failures in CI seem related
Fixed and eagle test passes locally

@mgoin mgoin merged commit d5944d5 into vllm-project:main Sep 23, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gpt-oss Related to GPT-OSS models llama Related to Llama models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants