-
Couldn't load subscription status.
- Fork 64
Support FP32 -> BF16 conversion in epilogue of GroupedGEMM #505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| - name: Upload Sarif Artifact | ||
| uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2 | ||
| with: | ||
| name: codeql-results-${{ matrix.language }} | ||
| path: ./results/${{ matrix.language }}.sarif | ||
| retention-days: 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't modify this file. Maybe it was modified by some GitHub Action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it wasn't. You might have made a mistake rebasing your branch locally. Please revert this change.
| using EpilogueOp = | ||
| cutlass::epilogue::fusion::LinearCombination<float_t, float_t>; | ||
|
|
||
| using CollectiveEpilogue = | ||
| typename cutlass::epilogue::collective::CollectiveBuilder< | ||
| cutlass::arch::IntelXe, cutlass::arch::OpClassTensorOp, TileShape, | ||
| Shape<_1, _1, _1>, cutlass::epilogue::collective::EpilogueTileAuto, | ||
| float, float, float, LayoutC, 1, ElementOutput, LayoutC, 1, | ||
| EpilogueDispatchPolicy, EpilogueOp>::CollectiveOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from ElementOutput being bfloat16_t, this is the only difference between the vanilla example, and this one.
@rolandschulz, #482 currently doesn't support this case (BF16 output with BF16 inputs & FP32 accum).
I explained this change here.
Also, please advise if I should combine the two examples (different output dtype) in one file.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we want to combine them into one example if they are almost identical.
Fixes #500
This new config is being supported:
Amatrices areBF16,Bmatrices areBF16,Cmatrices areFP32, andDmatrices areBF16. The conversion of output from FP32 to BF16 happens in the epilogue.Just one line change in
include/cutlass/epilogue/collective/builders/xe_builder.inlto enable dtype conversion in epilogue for GroupedGEMM in the cutlass headers, but the GroupedGEMM example fromexamples/04_bmg_grouped_gemm/04_bmg_grouped_gemm_bf16_output.cpphas been copy-pasted all over again (please use a diff tool such as BeyondCompare to see the difference in both files) to create a new file with a few lines of code changes that I mostly adapted/copy-pasted from https://github.com/intel/cutlass-sycl/blob/e83f147263dd8ca3589b34d76ce6fbec58bac048/test/unit/gemm/device/default_gemm_group_configuration.hpp.Ideally, we should retain one example & test different output dtypes in it. I'm open to making such a change.
Thanks!