Support FP32 -> BF16 conversion in epilogue of GroupedGEMM #505

sanchitintel · 2025-09-12T00:24:24Z

Fixes #500

This new config is being supported: A matrices are BF16, B matrices are BF16, C matrices are FP32, and D matrices are BF16. The conversion of output from FP32 to BF16 happens in the epilogue.

Just one line change in include/cutlass/epilogue/collective/builders/xe_builder.inl to enable dtype conversion in epilogue for GroupedGEMM in the cutlass headers, but the GroupedGEMM example from examples/04_bmg_grouped_gemm/04_bmg_grouped_gemm_bf16_output.cpp has been copy-pasted all over again (please use a diff tool such as BeyondCompare to see the difference in both files) to create a new file with a few lines of code changes that I mostly adapted/copy-pasted from https://github.com/intel/cutlass-sycl/blob/e83f147263dd8ca3589b34d76ce6fbec58bac048/test/unit/gemm/device/default_gemm_group_configuration.hpp.

Ideally, we should retain one example & test different output dtypes in it. I'm open to making such a change.

Thanks!

sanchitintel · 2025-09-12T01:37:02Z

.github/workflows/codeql.yml

+
+    - name: Upload Sarif Artifact
+      uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
+      with:
+        name: codeql-results-${{ matrix.language }}
+        path: ./results/${{ matrix.language }}.sarif
+        retention-days: 7


I didn't modify this file. Maybe it was modified by some GitHub Action

no it wasn't. You might have made a mistake rebasing your branch locally. Please revert this change.

sanchitintel · 2025-09-12T16:13:22Z

examples/04_bmg_grouped_gemm/04_bmg_grouped_gemm_bf16_output.cpp

+  using EpilogueOp =
+      cutlass::epilogue::fusion::LinearCombination<float_t, float_t>;
+
+  using CollectiveEpilogue =
+      typename cutlass::epilogue::collective::CollectiveBuilder<
+          cutlass::arch::IntelXe, cutlass::arch::OpClassTensorOp, TileShape,
+          Shape<_1, _1, _1>, cutlass::epilogue::collective::EpilogueTileAuto,
+          float, float, float, LayoutC, 1, ElementOutput, LayoutC, 1,
+          EpilogueDispatchPolicy, EpilogueOp>::CollectiveOp;


Apart from ElementOutput being bfloat16_t, this is the only difference between the vanilla example, and this one.

@rolandschulz, #482 currently doesn't support this case (BF16 output with BF16 inputs & FP32 accum).
I explained this change here.

Also, please advise if I should combine the two examples (different output dtype) in one file.

Thanks!

Yes I think we want to combine them into one example if they are almost identical.

Support dtype conversion in epilogue for GroupedGEMM

9e7f43b

sanchitintel mentioned this pull request Sep 12, 2025

Support fp32 accumulation for bf16 gemm and grouped gemm #482

Open

sanchitintel requested a review from rolandschulz September 12, 2025 01:33

sanchitintel marked this pull request as ready for review September 12, 2025 01:33

sanchitintel commented Sep 12, 2025

View reviewed changes

sanchitintel changed the title ~~Support dtype conversion in epilogue for GroupedGEMM~~ Support FP32 -> BF16 conversion in epilogue of GroupedGEMM Sep 12, 2025

sanchitintel commented Sep 12, 2025

View reviewed changes

sanchitintel marked this pull request as draft September 12, 2025 16:42

This was referenced Sep 25, 2025

Grouped gemm cutlass vllm-project/vllm-xpu-kernels#22

Merged

MoEGEMM as an extension of GroupGEMM #520

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support FP32 -> BF16 conversion in epilogue of GroupedGEMM #505

Support FP32 -> BF16 conversion in epilogue of GroupedGEMM #505

Uh oh!

sanchitintel commented Sep 12, 2025 •

edited

Loading

Uh oh!

sanchitintel Sep 12, 2025 •

edited

Loading

Uh oh!

rolandschulz Sep 12, 2025

Uh oh!

sanchitintel Sep 12, 2025 •

edited

Loading

Uh oh!

rolandschulz Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Support FP32 -> BF16 conversion in epilogue of GroupedGEMM #505

Are you sure you want to change the base?

Support FP32 -> BF16 conversion in epilogue of GroupedGEMM #505

Uh oh!

Conversation

sanchitintel commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanchitintel Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolandschulz Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

sanchitintel Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolandschulz Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanchitintel commented Sep 12, 2025 •

edited

Loading

sanchitintel Sep 12, 2025 •

edited

Loading

sanchitintel Sep 12, 2025 •

edited

Loading