forked from NVIDIA/cutlass
-
Couldn't load subscription status.
- Fork 64
Pull requests: intel/sycl-tla
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Gemm Universal unit tests for MainloopIntelW8A8 API
Tests
For Unit tests and Benchmark tests and general validation specific changes
#584
opened Oct 28, 2025 by
rishi-yadav
Loading…
Changes for new cute apis prefetch transpose vnni
Tests
For Unit tests and Benchmark tests and general validation specific changes
#583
opened Oct 28, 2025 by
rishi-yadav
Loading…
Unit tests for LOAD_2D and STORE_2D
Tests
For Unit tests and Benchmark tests and general validation specific changes
#582
opened Oct 28, 2025 by
rishi-yadav
Loading…
New mma_atoms and copy_atoms in bmg_grouped_gemm_fp8
#579
opened Oct 24, 2025 by
nsingh-habana
•
Draft
Use newer version of copy_atom in epilogue collective
release
urgent
PR requires a urgent attention (for release or blocking another PR)
#573
opened Oct 22, 2025 by
anamikac-intel
Loading…
[DOCS] Clarify existing VNNI load visualization and add another
#571
opened Oct 20, 2025 by
sanchitintel
Loading…
[PYTORCHDGQ-6865] Added support for RoPE on chunk prefill [WIP]
#569
opened Oct 20, 2025 by
pralay-das
•
Draft
Add CuTe Matrix Transpose tutorial
examples
Label for adding examples, complex kernels development using cutlass or cute APIS
information required
The PR requires more information to review them properly
Add python API for flash-attn
information required
The PR requires more information to review them properly
redesign required
Implementation require a redesign
wontfix
This will not be worked on
#558
opened Oct 13, 2025 by
YangKai0616
Loading…
Rewrite mma unit tests
Tests
For Unit tests and Benchmark tests and general validation specific changes
#557
opened Oct 13, 2025 by
yuanhang-dev
Loading…
Skip alignment check for sourceless epilogues
bug
Something isn't working
#555
opened Oct 13, 2025 by
nsingh-habana
•
Draft
[CI][WIP] Fix coverity workflow
Tests
For Unit tests and Benchmark tests and general validation specific changes
First version of SDPA Fwd - No need to review
redesign required
Implementation require a redesign
#548
opened Oct 6, 2025 by
cfgfung
Loading…
Re-implement FlashAttention with new Xe atoms
enhancement
New feature or request
release
urgent
PR requires a urgent attention (for release or blocking another PR)
#547
opened Oct 4, 2025 by
petercad
Loading…
upload 2nd version of sdpa backward
redesign required
Implementation require a redesign
#546
opened Oct 3, 2025 by
yuankuns
Loading…
Support of FP8 Chunk Prefill kernel
redesign required
Implementation require a redesign
#542
opened Oct 1, 2025 by
adityachatter
Loading…
Support
nullptr value of argument ptr_C for xe_array_epilogue
#541
opened Sep 29, 2025 by
sanchitintel
Loading…
Attention sink support
redesign required
Implementation require a redesign
#533
opened Sep 25, 2025 by
kareemshaik80
Loading…
Add dimension check to prevent out-of-bounds access in example 05_bmg_gemm_with_epilogue_splitk
#529
opened Sep 23, 2025 by
ClarkChin08
Loading…
[PYTORCHDGQ-7000] Added support for Rotary Embedding in flash_attention
redesign required
Implementation require a redesign
#523
opened Sep 19, 2025 by
pralay-das
Loading…
Add a new tile scheduler for varlen prefill to avoid launching empty work groups
redesign required
Implementation require a redesign
#516
opened Sep 18, 2025 by
carsonwang
Loading…
Also use column-major B matrix in the example
00_bmg_gemm.cpp
#510
opened Sep 13, 2025 by
sanchitintel
Loading…
Remove redundant code from GroupGEMM implementation
#508
opened Sep 12, 2025 by
sanchitintel
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.