From 9f7a542496fe3630fb1122261811da6699bae262 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Wed, 26 Feb 2020 23:08:40 -0800 Subject: [PATCH 1/2] Add .bitmask instruction family i8x16.bitmask and i32x4.bitmask directly map to SSE movemask instructions; i16x8.bitmask can be synthesized using packs+movemask. These instructions are important to be able to do lane-wise processing after a vector comparison - for example, these can be used together with ctz to find the index of the first lane with the matching values after a comparison instruction. --- proposals/simd/SIMD.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md index bfc81d5c7..56940aa22 100644 --- a/proposals/simd/SIMD.md +++ b/proposals/simd/SIMD.md @@ -648,6 +648,24 @@ def S.all_true(a): return 1 ``` +## Bitmask extraction + +* `i8x16.bitmask(a: v128) -> i32` +* `i16x8.bitmask(a: v128) -> i32` +* `i32x4.bitmask(a: v128) -> i32` + +These operations extract the high bit for each lane in `a` and produce a scalar +mask with all bits concatenated. + +```python +def S.bitmask(a): + result = 0 + for i in range(S.Lanes): + if a[i] < 0: + result = result | (1 << i) + return result +``` + ## Comparisons The comparison operations all compare two vectors lane-wise, and produce a mask From cd5e9a5bdbe616daed9836d77f2207da2f6e5556 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Thu, 21 May 2020 09:31:35 -0700 Subject: [PATCH 2/2] Update opcode tables with bitmask --- proposals/simd/BinarySIMD.md | 3 +++ proposals/simd/ImplementationStatus.md | 3 +++ proposals/simd/NewOpcodes.md | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/proposals/simd/BinarySIMD.md b/proposals/simd/BinarySIMD.md index ab88336bc..6c8adcdb2 100644 --- a/proposals/simd/BinarySIMD.md +++ b/proposals/simd/BinarySIMD.md @@ -114,6 +114,7 @@ The `v8x16.shuffle` instruction has 16 bytes after `simdop`. | `i8x16.neg` | `0x61`| - | | `i8x16.any_true` | `0x62`| - | | `i8x16.all_true` | `0x63`| - | +| `i8x16.bitmask` | `0x64`| - | | `i8x16.narrow_i16x8_s` | `0x65`| - | | `i8x16.narrow_i16x8_u` | `0x66`| - | | `i8x16.shl` | `0x6b`| - | @@ -134,6 +135,7 @@ The `v8x16.shuffle` instruction has 16 bytes after `simdop`. | `i16x8.neg` | `0x81`| - | | `i16x8.any_true` | `0x82`| - | | `i16x8.all_true` | `0x83`| - | +| `i16x8.bitmask` | `0x84`| - | | `i16x8.narrow_i32x4_s` | `0x85`| - | | `i16x8.narrow_i32x4_u` | `0x86`| - | | `i16x8.widen_low_i8x16_s` | `0x87`| - | @@ -159,6 +161,7 @@ The `v8x16.shuffle` instruction has 16 bytes after `simdop`. | `i32x4.neg` | `0xa1`| - | | `i32x4.any_true` | `0xa2`| - | | `i32x4.all_true` | `0xa3`| - | +| `i32x4.bitmask` | `0xa4`| - | | `i32x4.widen_low_i16x8_s` | `0xa7`| - | | `i32x4.widen_high_i16x8_s` | `0xa8`| - | | `i32x4.widen_low_i16x8_u` | `0xa9`| - | diff --git a/proposals/simd/ImplementationStatus.md b/proposals/simd/ImplementationStatus.md index cc944436a..2353ae98f 100644 --- a/proposals/simd/ImplementationStatus.md +++ b/proposals/simd/ImplementationStatus.md @@ -87,6 +87,7 @@ | `i8x16.neg` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i8x16.any_true` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i8x16.all_true` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | +| `i8x16.bitmask` | `-munimplemented-simd128` | :heavy_check_mark: | | | | | `i8x16.narrow_i16x8_s` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i8x16.narrow_i16x8_u` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i8x16.shl` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | @@ -107,6 +108,7 @@ | `i16x8.neg` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i16x8.any_true` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i16x8.all_true` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | +| `i16x8.bitmask` | `-munimplemented-simd128` | :heavy_check_mark: | | | | | `i16x8.narrow_i32x4_s` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i16x8.narrow_i32x4_u` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i16x8.widen_low_i8x16_s` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | @@ -132,6 +134,7 @@ | `i32x4.neg` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i32x4.any_true` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i32x4.all_true` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | +| `i32x4.bitmask` | `-munimplemented-simd128` | :heavy_check_mark: | | | | | `i32x4.widen_low_i16x8_s` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i32x4.widen_high_i16x8_s` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | | `i32x4.widen_low_i16x8_u` | `-msimd128` | :heavy_check_mark: | | | :heavy_check_mark: | diff --git a/proposals/simd/NewOpcodes.md b/proposals/simd/NewOpcodes.md index b7aaae2c0..2752fa2a5 100644 --- a/proposals/simd/NewOpcodes.md +++ b/proposals/simd/NewOpcodes.md @@ -82,7 +82,7 @@ | i8x16.neg | 0x61 | i16x8.neg | 0x81 | i32x4.neg | 0xa1 | i64x2.neg | 0xc1 | | i8x16.any_true | 0x62 | i16x8.any_true | 0x82 | i32x4.any_true | 0xa2 | ---- | 0xc2 | | i8x16.all_true | 0x63 | i16x8.all_true | 0x83 | i32x4.all_true | 0xa3 | ---- | 0xc3 | -| ---- bitmask ---- | 0x64 | ---- bitmask ---- | 0x84 | ---- bitmask ---- | 0xa4 | ---- | 0xc4 | +| i8x16.bitmask | 0x64 | i16x8.bitmask | 0x84 | i32x4.bitmask | 0xa4 | ---- | 0xc4 | | i8x16.narrow_i16x8_s | 0x65 | i16x8.narrow_i32x4_s | 0x85 | ---- narrow ---- | 0xa5 | ---- | 0xc5 | | i8x16.narrow_i16x8_u | 0x66 | i16x8.narrow_i32x4_u | 0x86 | ---- narrow ---- | 0xa6 | ---- | 0xc6 | | ---- widen ---- | 0x67 | i16x8.widen_low_i8x16_s | 0x87 | i32x4.widen_low_i16x8_s | 0xa7 | ---- | 0xc7 |