Skip to content

Layout optimization for Result<&T, E>-like types #48741

Closed
@glandium

Description

@glandium

One realization is that the hot case for Result<T, E> is usually the Ok() case. And in many cases, T is actually some sort of NonNull pointer: a ref, a Box, etc.

The current layout for Result<NonNull<T>, E> is (tag, union { NonNull, E }). Which means either way, the code needs to read the tag, and then read the union.

If instead, the layout was (union { tag, NonNull }, E), then the common case becomes one read.

The generalization could be formulated like this: When the tag is a boolean, and the first variant is a NonNull/NonZero type, the first variant is stored in place of the tag, and the invalid zero value acts as tag for the second variant.

So Ok(value) would be (value, undefined), and Err(e) would be (0, e).

Some code to show the benefits of this optimization:

#![feature(nonzero)]
extern crate core;
use core::nonzero::NonZero;

pub struct Foo(usize, usize);

impl Foo {
    fn as_result(&self) -> Result<NonZero<usize>, usize> {
        if self.0 > 0 {
            Ok(unsafe { NonZero::new_unchecked(self.0) })
        } else {
            Err(self.1)
        }
    }
}

pub fn foo(f: &Foo) -> Option<NonZero<usize>> {
    f.as_result().ok()
}

pub fn foo_unwrap(f: &Foo) -> usize {
    f.as_result().unwrap().get()
}

pub fn bar(f: &Result<NonZero<usize>, usize>) -> Option<NonZero<usize>> {
    f.ok()
}

pub fn bar_unwrap(f: &Result<NonZero<usize>, usize>) -> usize {
    f.unwrap().get()
}

Compiled as the following with godbolt:

example::foo:
  push rbp
  mov rbp, rsp
  mov rax, qword ptr [rdi]
  pop rbp
  ret

example::foo_unwrap:
  mov rax, qword ptr [rdi]
  test rax, rax
  jne .LBB4_2
  mov rax, qword ptr [rdi + 8]
.LBB4_2:
  je .LBB4_3
  ret
.LBB4_3:
  push rbp
  mov rbp, rsp
  mov rdi, rax
  call core::result::unwrap_failed
  ud2

example::bar:
  push rbp
  mov rbp, rsp
  cmp qword ptr [rdi], 1
  je .LBB5_1
  mov rax, qword ptr [rdi + 8]
  pop rbp
  ret
.LBB5_1:
  xor eax, eax
  pop rbp
  ret

example::bar_unwrap:
  mov rax, qword ptr [rdi + 8]
  cmp qword ptr [rdi], 1
  je .LBB6_1
  ret
.LBB6_1:
  push rbp
  mov rbp, rsp
  mov rdi, rax
  call core::result::unwrap_failed
  ud2

This doesn't really remove branches in the example above, but removes the need to read memory in the common case (although, the data is probably in the same cache-line, or in the next pre-fetched one, but that's still less instructions to execute). In some cases, I've seen the compiler use cmov instead of a branch, though.

Note the compiler does a poor job with foo_unwrap, for some reason... manually inlining as_result() makes it generate better code.

This could be applied to slices too, where Result<&[T], E> could become (union { tag, slice-ptr }, union { slice-size, E}), in which case this would even make the type smaller than (tag, union { (slice-ptr, slice-size), E }).

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: A feature request, i.e: not implemented / a PR.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions