Description
The _mm256_bsrli_epi128
intrinsic in core::arch::x86::avx2
can be used to shift right each 128-bit lane in a 256-bit vector by an IMM8
number of bytes.
The implementation of this intrinsic is buggy when the shift argument IMM8
is greater than 15.
In particular, it behaves differently from the Intel documentation and from the corresponding C intrinsic in clang.
The relevant Intel documentation for this intrinsic is here: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128&ig_expand=628
When the shift argument imm8
is greater than 15, the resulting vector should contain all zeroes.
Indeed, this is what clang does: https://godbolt.org/z/6o3W96qhP
However, the Rust implementation shifts right by `IMM8 % 16' and so produces a different result.
See: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=500d1a08d16b30a6cc8547b607918854
(This example results in particularly weird behavior since shifting right by 1..15 bytes yields all 0s, but shifting right by > 16 yields a non-zero result.)
The bug is in this line of this commit:
stdarch/crates/core_arch/src/x86/avx2.rs
Line 2782 in 3559569
Removing the % 16
would make this implementation consistent with Intel's documentation and with clang.
Note: This issue was found by Aniket Mishra, an intern at Cryspen working on verifying parts of the Rust core library, specifically this challenge: model-checking/verify-rust-std#173