flashinfer-ai / flashinfer Public

Notifications You must be signed in to change notification settings
Fork 375
Star 3.3k

Code
Issues 137
Pull requests 25
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: flashinfer-ai/flashinfer

Labels 16 Milestones 0

New pull request New

25 Open 883 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add trtllm-gen attention kernel with FP8 Q/K/V and FP4/FP8 output

#1242 opened Jul 11, 2025 by weireweire

Loading…

feat: Support MXFP8 x MXFP4 CUTLASS grouped GEMM

#1241 opened Jul 11, 2025 by jinyangyuan-nvidia

Loading…

5 tasks

add trtllm-gen context attention

#1239 opened Jul 10, 2025 by IwakuraRein

Loading…

2 of 5 tasks

feat: expose python APIs for cutlass blackwell fmha fp8 kernels

#1238 opened Jul 10, 2025 by yzh119 • Draft

5 tasks

feat: Restore convenience FLASHINFER_ENABLE_AOT option

#1235 opened Jul 8, 2025 by mgorny

Loading…

3 of 5 tasks

[Feature] Support batch prefill for POD Attention

#1231 opened Jul 8, 2025 by Edenzzzz

Loading…

6 tasks

feat: add trtllm-gen mla cubin

#1222 opened Jul 7, 2025 by yyihuang

Loading…

5 tasks done

Feature/sm100 low latency nvfp4 kernels priority: high

#1214 opened Jul 4, 2025 by azhurkevich

Loading…

1 of 5 tasks

Add DeepGEMM kernels

#1209 opened Jul 2, 2025 by cyx-6

Loading…

5 tasks

Add ruff to pre-commit

#1201 opened Jul 1, 2025 by cyx-6 • Draft

5 tasks

jit: supporting transform output in attention jit template

#1184 opened Jun 27, 2025 by yzh119 • Draft

5 tasks

Add mypy to pre-commit

#1179 opened Jun 26, 2025 by cyx-6 • Draft

5 tasks

Use flashinfer softmax in top_k_top_p_sampling_from_logits

#1171 opened Jun 24, 2025 by lgeiger

Loading…

5 tasks done

[wip] Multimem allreduce cutlass dsl

#1169 opened Jun 23, 2025 by Amir-19 • Draft

5 tasks

Port AllGather/ReduceScatter from TensorRT-LLM

#1145 opened Jun 15, 2025 by wenscarl

Loading…

5 tasks

[wip] refactor: add more features to vec_t

#1142 opened Jun 12, 2025 by yzh119 • Draft

5 tasks

[WIP] refactor: unifying return status of different backend implementation

#1141 opened Jun 12, 2025 by yzh119

Loading…

5 tasks

[wip] ci: update pre-commits

#1094 opened May 26, 2025 by yzh119

Loading…

5 tasks

feat: add mla custom mask

#1017 opened Apr 13, 2025 by yyihuang • Draft

FlashInfer Windows support

#964 opened Mar 21, 2025 by SystemPanic

Loading…

[RFC] Redesigned CMake build infrastructure for C++ API

#944 opened Mar 14, 2025 by diptorupd

Loading…

initial support blackwell

#747 opened Jan 21, 2025 by johnnynunez

Loading…

[WIP][AMDGPU] try rocm POC

#491 opened Sep 4, 2024 by yiakwy-xpu-ml-framework-team • Draft

perf: Fail fast on empty query for BatchPrefillWithPagedKVCacheKernel

#377 opened Jul 17, 2024 by Yard1 • Draft

perf: use cub's native BlockLoad/BlockStore for sampling kernels

#309 opened Jun 17, 2024 by yzh119 • Draft

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!