[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue #25406

jiahanc · 2025-09-22T16:55:16Z

Purpose

Inherit from #23596

Current gpt-oss eagle3 does not work on https://huggingface.co/nvidia/gpt-oss-120b-Eagle3, mainly config issues like vocab_size and lm_head
The current implementation in spec-dec suppose only 1 attention groups :https://github.com/vllm-project/vllm/blob/main/vllm/v1/worker/gpu_model_runner.py#L1178-L1180.
And https://github.com/vllm-project/vllm/blob/main/vllm/v1/spec_decode/eagle.py#L194-L199 only use the 1st attention builder.
This causes accuracy issues because on models that have multiple attentions like gpt-oss, it will select the wrong attention group so draft model will use wrong attention as well as overwrite the KV cache of the target models

Test Plan

lm_eval --model local-completions --tasks gsm8k --model_args model=openai/gpt-oss-120b,base_url=http://0.0.0.0:30000/v1/completions,max_retries=3,tokenized_requests=False,timeout=1200,max_gen_toks=2048,max_length=8192 --batch_size 2048 --trust_remote_code --limit 0.8

Test Result

Before fix
Low AR:

(APIServer pid=21745) INFO 09-22 10:49:59 [metrics.py:96] SpecDecoding metrics: Mean acceptance length: 1.04, Accepted throughput: 77.80 tokens/s, Drafted throughput: 6284.37 tokens/s, Accepted: 778 tokens, Drafted: 62847 tokens, Per-position acceptance rate: 0.035, 0.002, 0.000, Avg Draft acceptance rate: 1.2%

Wrong Accuracy

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.0038|±  |0.0019|
|     |       |strict-match    |     5|exact_match|↑  |0.0000|±  |0.0000|

After fix
Correct AR

(APIServer pid=22270) INFO 09-22 10:55:25 [metrics.py:96] SpecDecoding metrics: Mean acceptance length: 2.81, Accepted throughput: 4559.99 tokens/s, Drafted throughput: 7571.32 tokens/s, Accepted: 45604 tokens, Drafted: 75720 tokens, Per-position acceptance rate: 0.755, 0.578, 0.474, Avg Draft acceptance rate: 60.2%

Correct Accuracy

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8362|±  |0.0114|
|     |       |strict-match    |     5|exact_match|↑  |0.5909|±  |0.0151|

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

vllm/model_executor/models/gpt_oss.py

benchislett

LGTM

vllm/config/model.py

vllm/v1/spec_decode/eagle.py

mgoin

LGTM, thank you!

mgoin · 2025-09-23T16:08:45Z

@jiahanc The failures in CI seem related

Signed-off-by: jiahanc <[email protected]>

jiahanc · 2025-09-23T17:25:37Z

@jiahanc The failures in CI seem related
Fixed and eagle test passes locally

jiahanc marked this pull request as ready for review September 22, 2025 16:55

jiahanc requested review from benchislett, luccafong, WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat, simon-mo, youkaichao, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256 and ProExpertProg as code owners September 22, 2025 16:55

mergify bot added llama Related to Llama models gpt-oss Related to GPT-OSS models speculative-decoding v1 labels Sep 22, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Sep 22, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 22, 2025

benchislett added the bug Something isn't working label Sep 22, 2025

jiahanc marked this pull request as draft September 22, 2025 17:18

jiahanc marked this pull request as ready for review September 22, 2025 17:24

benchislett reviewed Sep 22, 2025

View reviewed changes

vllm/model_executor/models/gpt_oss.py Outdated Show resolved Hide resolved

benchislett approved these changes Sep 22, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 22, 2025

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 22, 2025

mgoin requested changes Sep 22, 2025

View reviewed changes

vllm/config/model.py Show resolved Hide resolved

vllm/v1/spec_decode/eagle.py Outdated Show resolved Hide resolved

github-project-automation bot moved this from Ready to In progress in gpt-oss Issues & Enhancements Sep 22, 2025

mgoin approved these changes Sep 22, 2025

View reviewed changes

github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Sep 22, 2025

mgoin enabled auto-merge (squash) September 22, 2025 21:07

auto-merge was automatically disabled September 22, 2025 23:08
Head branch was pushed to by a user without write access

jiahanc added 5 commits September 23, 2025 09:26

update fix

1b5d422

Signed-off-by: jiahanc <[email protected]>

fix lm head

f06c0d8

Signed-off-by: jiahanc <[email protected]>

remove not need code

84dba6c

Signed-off-by: jiahanc <[email protected]>

remove

42111d9

Signed-off-by: jiahanc <[email protected]>

fix draft attn metadata mapping

677ebdd

Signed-off-by: jiahanc <[email protected]>

jiahanc force-pushed the jiahanc/gpt-oss-eagle3-fix branch from 19d7184 to 677ebdd Compare September 23, 2025 16:26

jiahanc added 2 commits September 23, 2025 10:05

fix ealge lm_head logic

9771a4d

Signed-off-by: jiahanc <[email protected]>

fix test

d221c16

Signed-off-by: jiahanc <[email protected]>

mgoin merged commit d5944d5 into vllm-project:main Sep 23, 2025
48 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Sep 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue #25406

[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue #25406

Uh oh!

jiahanc commented Sep 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

benchislett left a comment

Uh oh!

Uh oh!

Uh oh!

mgoin left a comment

Uh oh!

mgoin commented Sep 23, 2025

Uh oh!

jiahanc commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue #25406

[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue #25406

Uh oh!

Conversation

jiahanc commented Sep 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin commented Sep 23, 2025

Uh oh!

jiahanc commented Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

jiahanc commented Sep 22, 2025 •

edited by github-actions bot

Loading