Bug in implementation of `_mm256_bsrli_epi128`

The `_mm256_bsrli_epi128` intrinsic in `core::arch::x86::avx2` can be used to shift right each 128-bit lane in a 256-bit vector by an `IMM8` number of bytes.

The implementation of this intrinsic is buggy when the shift argument `IMM8` is greater than 15.
In particular, it behaves differently from the Intel documentation and from the corresponding C intrinsic in clang.

The relevant Intel documentation for this intrinsic is here: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128&ig_expand=628

When the shift argument `imm8` is greater than 15, the resulting vector should contain all zeroes.
Indeed, this is what clang does: https://godbolt.org/z/6o3W96qhP

However, the Rust implementation shifts right by `IMM8 % 16' and so produces a different result.
See: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=500d1a08d16b30a6cc8547b607918854

(This example results in particularly weird behavior since shifting right by 1..15 bytes yields all 0s, but shifting right by > 16 yields a non-zero result.)

The bug is in this line of this commit: https://github.com/rust-lang/stdarch/blob/35595690d7881b20bd419875c2b64f284534d237/crates/core_arch/src/x86/avx2.rs#L2782.

Removing the `% 16` would make this implementation consistent with Intel's documentation and with clang.

Note: This issue was found by Aniket Mishra, an intern at Cryspen working on verifying parts of the Rust core library, specifically this challenge: https://github.com/model-checking/verify-rust-std/issues/173

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug in implementation of `_mm256_bsrli_epi128` #1822

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug in implementation of _mm256_bsrli_epi128 #1822

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug in implementation of `_mm256_bsrli_epi128` #1822