Layout optimization for Result<&T, E>-like types

One realization is that the hot case for `Result<T, E>` is usually the `Ok()` case. And in many cases, `T` is actually some sort of NonNull pointer: a ref, a Box, etc.

The current layout for `Result<NonNull<T>, E>` is (tag, union { NonNull<T>, E }). Which means either way, the code needs to read the tag, and then read the union.

If instead, the layout was (union { tag, NonNull<T> }, E), then the common case becomes one read.

The generalization could be formulated like this: When the tag is a boolean, and the first variant is a NonNull/NonZero type, the first variant is stored in place of the tag, and the invalid zero value acts as tag for the second variant.

So `Ok(value)` would be (value, undefined), and `Err(e)` would be (0, e).

Some code to show the benefits of this optimization:
```rust
#![feature(nonzero)]
extern crate core;
use core::nonzero::NonZero;

pub struct Foo(usize, usize);

impl Foo {
    fn as_result(&self) -> Result<NonZero<usize>, usize> {
        if self.0 > 0 {
            Ok(unsafe { NonZero::new_unchecked(self.0) })
        } else {
            Err(self.1)
        }
    }
}

pub fn foo(f: &Foo) -> Option<NonZero<usize>> {
    f.as_result().ok()
}

pub fn foo_unwrap(f: &Foo) -> usize {
    f.as_result().unwrap().get()
}

pub fn bar(f: &Result<NonZero<usize>, usize>) -> Option<NonZero<usize>> {
    f.ok()
}

pub fn bar_unwrap(f: &Result<NonZero<usize>, usize>) -> usize {
    f.unwrap().get()
}
```

Compiled as the following with godbolt:
```asm
example::foo:
  push rbp
  mov rbp, rsp
  mov rax, qword ptr [rdi]
  pop rbp
  ret

example::foo_unwrap:
  mov rax, qword ptr [rdi]
  test rax, rax
  jne .LBB4_2
  mov rax, qword ptr [rdi + 8]
.LBB4_2:
  je .LBB4_3
  ret
.LBB4_3:
  push rbp
  mov rbp, rsp
  mov rdi, rax
  call core::result::unwrap_failed
  ud2

example::bar:
  push rbp
  mov rbp, rsp
  cmp qword ptr [rdi], 1
  je .LBB5_1
  mov rax, qword ptr [rdi + 8]
  pop rbp
  ret
.LBB5_1:
  xor eax, eax
  pop rbp
  ret

example::bar_unwrap:
  mov rax, qword ptr [rdi + 8]
  cmp qword ptr [rdi], 1
  je .LBB6_1
  ret
.LBB6_1:
  push rbp
  mov rbp, rsp
  mov rdi, rax
  call core::result::unwrap_failed
  ud2
```

This doesn't really remove branches in the example above, but removes the need to read memory in the common case (although, the data is probably in the same cache-line, or in the next pre-fetched one, but that's still less instructions to execute). In some cases, I've seen the compiler use cmov instead of a branch, though.

Note the compiler does a poor job with `foo_unwrap`, for some reason...  manually inlining as_result() makes it generate better code.

This could be applied to slices too, where Result<&[T], E> could become (union { tag, slice-ptr }, union { slice-size, E}), in which case this would even make the type smaller than (tag, union { (slice-ptr, slice-size), E }).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Layout optimization for Result<&T, E>-like types #48741

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Layout optimization for Result<&T, E>-like types #48741

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions