Implement faster, unrolled ShortU16 decoder #47

cpubot · 2025-12-05T17:02:58Z

The solana-short-vec ShortU16 implementation incurs unnecessary overhead from:

Value mask before checking continuation bit (wasted instructions on theoretically hot 1 byte path)
Loop with dynamic i * 7 shift
Defensive and technically unnecessary saturating_* / checked_* ops to guard against theoretical loop edge cases
All the serde badness when paired with bincode

The compiler actually does a decent job optimizing this, but it can be improved.
https://rust.godbolt.org/z/rojK31Wf4

This implementation:

Fully unrolled with constant shifts (7, 14)
No unnecessary widening or defensive ops — unrolled structure makes overflow impossible by construction
const fn
no serde obviously

https://rust.godbolt.org/z/eYzq1G8oT

results

ShortU16/solana_short_vec:decode_shortu16_len/127
                        time:   [8.1727 ns 8.1882 ns 8.2045 ns]
                        thrpt:  [116.24 MiB/s 116.47 MiB/s 116.69 MiB/s]
ShortU16/wincode:decode_short_u16/127
                        time:   [1.4131 ns 1.4156 ns 1.4179 ns]
                        thrpt:  [672.57 MiB/s 673.71 MiB/s 674.90 MiB/s]

ShortU16/solana_short_vec:decode_shortu16_len/16383
                        time:   [8.0641 ns 8.0804 ns 8.0978 ns]
                        thrpt:  [235.54 MiB/s 236.05 MiB/s 236.52 MiB/s]
ShortU16/wincode:decode_short_u16/16383
                        time:   [1.5453 ns 1.5522 ns 1.5588 ns]
                        thrpt:  [1.1950 GiB/s 1.2000 GiB/s 1.2054 GiB/s]

ShortU16/solana_short_vec:decode_shortu16_len/65535
                        time:   [8.4347 ns 8.4469 ns 8.4589 ns]
                        thrpt:  [338.23 MiB/s 338.71 MiB/s 339.20 MiB/s]
ShortU16/wincode:decode_short_u16/65535
                        time:   [1.7430 ns 1.7457 ns 1.7484 ns]
                        thrpt:  [1.5980 GiB/s 1.6005 GiB/s 1.6030 GiB/s]

results are obviously more dramatic without black_boxing since the compiler can constant fold e.g.,

ShortU16/solana_short_vec:decode_shortu16_len/65535
                        time:   [8.1254 ns 8.1398 ns 8.1548 ns]
                        thrpt:  [350.84 MiB/s 351.48 MiB/s 352.11 MiB/s]
                 change:
                        time:   [-3.9026% -3.6384% -3.3188%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4328% +3.7757% +4.0611%]
                        Performance has improved.
ShortU16/wincode:decode_short_u16/65535
                        time:   [599.87 ps 600.92 ps 602.20 ps]
                        thrpt:  [4.6396 GiB/s 4.6495 GiB/s 4.6576 GiB/s]
                 change:
                        time:   [-65.392% -65.287% -65.146%] (p = 0.00 < 0.05)
                        thrpt:  [+186.91% +188.07% +188.95%]
                        Performance has improved.

t-nelson · 2025-12-05T20:13:05Z

we need to be very careful with review here. this type has been responsible for at least two bounty payouts

wincode/src/len.rs

Implement faster, unrolled ShortU16 decoder

1a1cce0

cpubot force-pushed the short-u16-decode branch from ea14390 to 1a1cce0 Compare December 5, 2025 17:05

cpubot requested a review from kskalski December 5, 2025 17:07

cpubot mentioned this pull request Dec 5, 2025

impl SchemaRead/SchemaWrite for ShortU16 #46

Merged

cpubot requested a review from t-nelson December 5, 2025 17:34

kskalski reviewed Dec 11, 2025

View reviewed changes

wincode/src/len.rs Outdated Show resolved Hide resolved

Remove redundant check

0836669

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement faster, unrolled ShortU16 decoder #47

Implement faster, unrolled ShortU16 decoder #47

Uh oh!

cpubot commented Dec 5, 2025

Uh oh!

t-nelson commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Implement faster, unrolled ShortU16 decoder #47

Are you sure you want to change the base?

Implement faster, unrolled ShortU16 decoder #47

Uh oh!

Conversation

cpubot commented Dec 5, 2025

Uh oh!

t-nelson commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants