Skip to content

Conversation

@Mousius
Copy link
Contributor

@Mousius Mousius commented Oct 7, 2025

This adds all the relevant bits and pieces to add a shgemv path as well as a future hgemm/hgemv path in a similar model to sb and b interfaces.

I've also fixed a few bits and pieces around shgemm which didn't build in a few situations.

This adds all the relevant bits and pieces to add a `shgemv` path as
well as a future `hgemm`/`hgemv` path in a similar model to `sb` and `b`
interfaces.

I've also fixed a few bits and pieces around `shgemm` which didn't build
in a few situations.
@martin-frbg
Copy link
Collaborator

Thanks (pity about the duplicate work though)

@Mousius
Copy link
Contributor Author

Mousius commented Oct 7, 2025

Thanks (pity about the duplicate work though)

Yeah, sorry about that, I was thinking about something else and got a bit carried away seeing what was missing here 🙀

@martin-frbg martin-frbg added this to the 0.3.31 milestone Oct 7, 2025
@martin-frbg martin-frbg merged commit de43ccc into OpenMathLib:develop Oct 7, 2025
83 of 88 checks passed
@Mousius
Copy link
Contributor Author

Mousius commented Oct 7, 2025

@martin-frbg I will leave hgemm/hgemm_batch/hgemm_strided and hgemv for you to make up for my over eagerness? 🙏

@Mousius Mousius deleted the shgemv-infra branch October 7, 2025 18:49
@ChipKerchner
Copy link
Contributor

#5480

@ChipKerchner
Copy link
Contributor

ChipKerchner commented Oct 7, 2025

Getting these compiler warnings....

/proj_sw/user_dev/ckerchner/OpenBLAS/interface/gemm.c:593:76: warning: macro expansion producing 'defined' has undefined behavior [-Wexpansion-to-defined]
  593 | #if defined(GEMM_GEMV_FORWARD) && !defined(GEMM3M) && !defined(COMPLEX) && HFLOAT16_GEMM_GEMV_FORWARD && BFLOAT16_GEMM_GEMV_FORWARD
      |                                                                            ^
/proj_sw/user_dev/ckerchner/OpenBLAS/interface/gemm.c:591:38: note: expanded from macro 'HFLOAT16_GEMM_GEMV_FORWARD'

@ChipKerchner
Copy link
Contributor

ChipKerchner commented Oct 7, 2025

Should probably be done like this https://stackoverflow.com/questions/42074035/how-to-deal-with-clangs-3-9-wexpansion-to-defined-warning

#if (!defined(BFLOAT16) || (!defined(BGEMM) && defined(SBGEMM_GEMV_FORWARD)) || (defined(BGEMM) && defined(BGEMM_GEMV_FORWARD)))
#define BFLOAT16_GEMM_GEMV_FORWARD 1
#else
#define BFLOAT16_GEMM_GEMV_FORWARD 0
#endif
#if (!defined(HFLOAT16) || (!defined(HGEMM) && defined(SHGEMM_GEMV_FORWARD)) || (defined(HGEMM) && defined(HGEMM_GEMV_FORWARD)))
#define HFLOAT16_GEMM_GEMV_FORWARD 1
#else
#define HFLOAT16_GEMM_GEMV_FORWARD 0
#endif

@ChipKerchner
Copy link
Contributor

ChipKerchner commented Oct 9, 2025

I see a couple of places in test/Makefile where BUILD_BFLOAT16 has been added but I don't see the same for BUILD_HFLOAT16.

It looks like we have support for SBGEMM but not SHGEMM?

@ChipKerchner
Copy link
Contributor

#5497

@Mousius Mousius mentioned this pull request Oct 10, 2025
@ChipKerchner
Copy link
Contributor

ChipKerchner commented Nov 5, 2025

Your conversion of the outputs for SBGEMM/V seems wrong since you are casting from F16 to BF32 with TO_F32

@Mousius
Copy link
Contributor Author

Mousius commented Nov 6, 2025

Your conversion of the outputs for SBGEMM/V seems wrong since you are casting from F16 to BF32 with TO_F32

Can you point me to the line @ChipKerchner ? The block is:

#if defined(BFLOAT16) && defined(BFLOAT16CONVERSION)

#ifdef BGEMM
#define TO_F32(x) (bfloat16tof32(x))
#else
#define TO_F32(x) (bfloat16tof32(x))
#endif

#elif defined(HFLOAT16)

#ifdef HGEMM
#define TO_F32(x) ((float)(x))
#else
#define TO_F32(x) ((float)(x))
#endif

#else

#define TO_F32(x) x

#endif

@martin-frbg
Copy link
Collaborator

He added review notes to the code changes, but weirdly I can only see them if I click on the corresponding notification in the gh web interface. As far as I can tell, these are indeed cases where the macro does not perform any actual conversion.

@ChipKerchner
Copy link
Contributor

I see the TO_F32 conversions here:

#ifdef BGEMM
#define ALPHA bfloat16tof32(alpha)
#define BETA bfloat16tof32(beta)
#define TO_F32(x) (bfloat16tof32(x))
#define TO_OUTPUT(x) (f32tobfloat16(x))
#else
#define ALPHA alpha
#define BETA beta
#define TO_F32(x) (bfloat16tof32(x)).     <- Convert
#define TO_OUTPUT(x) x
#endif

Bad SBGEMV conversion here (a 2nd one in gemv_t):

            y[iy] = TO_OUTPUT(ALPHA * temp + BETA * TO_F32(y[iy]));  <- Converts from BF16 -> F32?

Bad SBGEMM conversion here:

             C0[0] = TO_OUTPUT(TO_F32(C0[0])+res0);  <- Converts from BF16 -> F32?

@Mousius
Copy link
Contributor Author

Mousius commented Nov 6, 2025

Argh, guessing we need to add a FROM_OUTPUT here 🤔

@martin-frbg
Copy link
Collaborator

Interestingly, this does not lead to failures in our tests - at least not on (emulated) RISCV, but AFAICT the code changes in question originate from your earlier "fix bfloat conversion for Neoverse" PR

@Mousius
Copy link
Contributor Author

Mousius commented Nov 7, 2025

Interestingly, this does not lead to failures in our tests - at least not on (emulated) RISCV, but AFAICT the code changes in question originate from your earlier "fix bfloat conversion for Neoverse" PR

I don't think so, that didn't touch the generic kernels: https://github.com/OpenMathLib/OpenBLAS/pull/5483/files

I'd imagine only the RISC-V CI is running the generic kernels? As the targets I was working on both have their own variants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants