Description
The documentation is clear that std::slice::from_raw_parts
(std::ptr::null, 0)
is UB.
This creates a significant hazard for FFI code interfacing with C++ (and very likely other languages). Although C++ doesn't have a built-in or standard-library type for a slice, the most common way of representing slices is as a start
pointer and a length, or as start
and end
pointers. As explained in this post, start
is conventionally allowed to be null (i.e. the representation of an empty slice can be (nullptr, 0)
or (nullptr, nullptr)
), and this is consistent with the behaviour of C++ standard library APIs such as std::span.
The stated rationale for this call being UB is:
data
must be non-null and aligned even for zero-length slices. One reason for this is that enum layout optimizations may rely on references (including slices of any length) being aligned and non-null to distinguish them from other data. You can obtain a pointer that is usable as data for zero-length slices using ptr::NonNull::dangling().
This rationale is unconvincing to me. It's unnecessary for the detail of the Rust ABI's internal representation of slices to result in exposing this hazard. The full signature is
pub const unsafe fn from_raw_parts<'a, T>(data: *const T, len: usize) -> &'a [T]
Null slices are a niche-value optimization, they're not part of the domain of &'a [T]
. When Rust programmers write a use of this function, they are most likely to be writing FFI code. We don't know where data
and len
are taken from; they could be using either the Rust or the C++ convention for an empty slice. But there is no ambiguity: neither of these representations can mean anything else than an empty slice. So, I argue that the implementation of slice::from_raw_parts
should accept either, converting a null pointer into an aligned "dangling" pointer as necessary.
The fact that this involves a representation change because of differences between C++ and Rust slice conventions shouldn't matter: returning a valid Rust representation of an empty slice is the only thing that could be correct. Note that it is fine for std::slice::from_raw_parts(ptr::null, len)
still to be UB when len > 0
.
To correctly convert a data
and len
that could be using the C++ convention, with the current definition of slice::from_raw_parts
you would need to do something like:
if data.is_null() {
slice::from_raw_parts(ptr::NonNull::<T>::dangling(), 0)
} else {
slice::from_raw_parts(data, len)
}
which is verbose and easy to miss the need for.
What about the cost of the null pointer check? Well, in many cases the compiler will be able to statically determine that data
is non-null. In those cases, when it inlines slice::from_raw_parts
it will optimize out the null check, and the cost will be zero. This includes both invocations in the code above. It even includes complicated cases such as this in bitvec
:
/// Views the bit-vector as a slice of its underlying memory elements.
#[inline]
pub fn as_raw_slice(&self) -> &[T] {
let (data, len) = (self.bitspan.address().to_const(), self.bitspan.elements());
unsafe { slice::from_raw_parts(data, len) }
}
Here self.bitspan.address().to_const()
returns the result of as_ptr()
on a NonNull
value (via this code in wyz
which is #[inline(always)]
), which allows the null check to be optimized out. Many of the other examples I looked at are like this.
The only cases where correct code can incur an additional cost for the null check is when the programmer knows —and is correct— that data
cannot be null, but the compiler does not. And we only care about that in cases where it causes a performance regression. I would suggest that this is probably very rare, and that it is likely to be much more common that programmers take a (start, len)
or (start, end)
slice according to the C++ convention and just don't think of the corner case where start
is null.
What about the possibility of code written for the new specification being run on an older Rust version that requires data
to be non-null? That's okay as long as we document when the requirement was changed. Then:
- If a programmer checked the previous documentation and wrote previously correct code, their code will still be correct.
- If they check the new documentation, they will see that a null pointer wasn't previously allowed. Either their MSRV is already after the version where it changed; or they bump their MSRV; or they check for a null pointer.
- If they wrote previously incorrect code, the code may work correctly under a later version of Rust. This is fine and what we intend. Eventually they will update the MSRV and the code will be correct.
Aside: String::from_raw_parts and Vec::from_raw_parts_in are similar, but the case for changing those is much weaker, since they impose strong constraints on the allocated region that would be unlikely to be met by a random slice coming from C++ or another language.