-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue #25406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Head branch was pushed to by a user without write access
@jiahanc The failures in CI seem related |
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
19d7184
to
677ebdd
Compare
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: jiahanc <[email protected]>
|
Purpose
Inherit from #23596
And https://github.com/vllm-project/vllm/blob/main/vllm/v1/spec_decode/eagle.py#L194-L199 only use the 1st attention builder.
This causes accuracy issues because on models that have multiple attentions like gpt-oss, it will select the wrong attention group so draft model will use wrong attention as well as overwrite the KV cache of the target models
Test Plan
lm_eval --model local-completions --tasks gsm8k --model_args model=openai/gpt-oss-120b,base_url=http://0.0.0.0:30000/v1/completions,max_retries=3,tokenized_requests=False,timeout=1200,max_gen_toks=2048,max_length=8192 --batch_size 2048 --trust_remote_code --limit 0.8
Test Result
Before fix
Low AR:
Wrong Accuracy
After fix
Correct AR
Correct Accuracy
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.