HabanaAI / vllm-hpu-extension Public

Notifications You must be signed in to change notification settings
Fork 39
Star 14

Code
Issues 1
Pull requests 27
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: HabanaAI/vllm-hpu-extension

Labels 10 Milestones 0

New pull request New

27 Open 318 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[SW-235186] Enable group indexing support

#349 opened Aug 25, 2025 by jmamzax

Loading…

Fix the warmup issue of batch expansion for Deepseek MTP

#343 opened Aug 21, 2025 by YuJiankang

Loading…

Modify ops.py for hunyuan fp8 model implementation

#342 opened Aug 20, 2025 by tianyuan211

Loading…

Fix for Llama4 models (targets main)

#341 opened Aug 19, 2025 by vidyasiv

Loading…

Pass Attn Sinks to Fused SDPA

#339 opened Aug 18, 2025 by SKRohit

Loading…

[SW-227615] adding quant config files to vllm

#336 opened Aug 14, 2025 by linoybu

Loading…

Fix for Llama4 models

#329 opened Aug 11, 2025 by vidyasiv

Loading…

2 tasks done

Add support for block_softmax_const_max

#327 opened Aug 7, 2025 by mswiniarsk • Draft

Add flag pin_memory to call from hpu.py in vllm

#325 opened Aug 5, 2025 by xuechendi

Loading…

Add Calibration Script for SGLang FP8

#318 opened Jul 29, 2025 by SKRohit

Loading…

Fix the fusedsdpa with sliding window alignment issue

#298 opened Jul 17, 2025 by libinta

Loading…

Draft: Proper chunked prefill bucketing

#295 opened Jul 16, 2025 by kzawora-intel • Draft

Add block_softmax_adjustment and block_softmax kernels

#289 opened Jul 16, 2025 by czhu15

Loading…

Enable calibration using pile-10k dataset for DeepSeek models

#279 opened Jul 14, 2025 by yangulei

Loading…

Introduce block_softmax_adjustment kernel (#163)

#263 opened Jul 8, 2025 by kdamaszk • Draft

Enable block_softmax_adjustment on Gaudi2

#254 opened Jul 2, 2025 by kdamaszk • Draft

Add pre-commit static checks

#247 opened Jun 30, 2025 by kzawora-intel

Loading…

Allow usage of fused_block_softmax_adjustment for Qwen with Lazy

#246 opened Jun 27, 2025 by mswiniarsk • Draft

Exponential bucketing tweaks

#224 opened Jun 13, 2025 by madamczyk-intel

Loading…

Add useful internal vllm test

#200 opened May 27, 2025 by nirda7 • Draft

[SW-225565] Enable triangular softmax with merged prefill

#197 opened May 26, 2025 by kamil-kaczor • Draft

Optimized MoE on Gaudi

#159 opened Apr 18, 2025 by gyou2021 • Draft

[FIX] fp8 gc compile error

#110 opened Mar 4, 2025 by maktukmak • Draft

Expand capability checks

#89 opened Feb 3, 2025 by kzawora-intel • Draft

Add renormalize parameter for FusedMOE's & modify experts_max arg of mixture_of_experts()

#70 opened Jan 9, 2025 by tangleintel • Draft

Previous 1 2 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!