Skip to content

Bug in implementation of _mm256_bsrli_epi128 #1822

Closed
@karthikbhargavan

Description

@karthikbhargavan

The _mm256_bsrli_epi128 intrinsic in core::arch::x86::avx2 can be used to shift right each 128-bit lane in a 256-bit vector by an IMM8 number of bytes.

The implementation of this intrinsic is buggy when the shift argument IMM8 is greater than 15.
In particular, it behaves differently from the Intel documentation and from the corresponding C intrinsic in clang.

The relevant Intel documentation for this intrinsic is here: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128&ig_expand=628

When the shift argument imm8 is greater than 15, the resulting vector should contain all zeroes.
Indeed, this is what clang does: https://godbolt.org/z/6o3W96qhP

However, the Rust implementation shifts right by `IMM8 % 16' and so produces a different result.
See: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=500d1a08d16b30a6cc8547b607918854

(This example results in particularly weird behavior since shifting right by 1..15 bytes yields all 0s, but shifting right by > 16 yields a non-zero result.)

The bug is in this line of this commit:

let r: i8x32 = match IMM8 % 16 {
.

Removing the % 16 would make this implementation consistent with Intel's documentation and with clang.

Note: This issue was found by Aniket Mishra, an intern at Cryspen working on verifying parts of the Rust core library, specifically this challenge: model-checking/verify-rust-std#173

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions