Hazard in the specification of std::slice::from_raw_parts, since other languages can represent empty slices with a null pointer

The documentation is clear that [`std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html)`(std::ptr::null, 0)` is UB.

This creates a significant hazard for FFI code interfacing with C++ ([and very likely other languages](https://stackoverflow.com/a/44305910/393146)). Although C++ doesn't have a built-in or standard-library type for a slice, the most common way of representing slices is as a `start` pointer and a length, or as `start` and `end` pointers. As explained in [this post](https://davidben.net/2024/01/15/empty-slices.html), `start` is conventionally allowed to be null (i.e. the representation of an empty slice can be `(nullptr, 0)` or `(nullptr, nullptr)`), and this is consistent with the behaviour of C++ standard library APIs such as [std::span](https://en.cppreference.com/w/cpp/container/span/span).

The stated rationale for this call being UB is:

> `data` must be non-null and aligned even for zero-length slices. One reason for this is that enum layout optimizations may rely on references (including slices of any length) being aligned and non-null to distinguish them from other data. You can obtain a pointer that is usable as data for zero-length slices using [ptr::NonNull::dangling()](https://doc.rust-lang.org/std/ptr/struct.NonNull.html#method.dangling).

This rationale is unconvincing to me. It's unnecessary for the detail of the Rust ABI's internal representation of slices to result in exposing this hazard. The full signature is
```rust
pub const unsafe fn from_raw_parts<'a, T>(data: *const T, len: usize) -> &'a [T]
```
Null slices are a niche-value optimization, they're not part of the domain of `&'a [T]`. When Rust programmers write a use of this function, they are most likely to be writing FFI code. We don't know where `data` and `len` are taken from; they could be using either the Rust or the C++ convention for an empty slice. But there is no ambiguity: neither of these representations can mean anything else than an empty slice. So, I argue that the implementation of `slice::from_raw_parts` should accept either, converting a null pointer into an aligned "dangling" pointer as necessary.

The fact that this involves a representation change because of differences between C++ and Rust slice conventions shouldn't matter: returning a valid Rust representation of an empty slice is the only thing that could be correct. Note that it is fine for `std::slice::from_raw_parts(ptr::null, len)` still to be UB when `len > 0`.

To correctly convert a `data` and `len` that could be using the C++ convention, with the current definition of `slice::from_raw_parts` you would need to do something like:
```rust
if data.is_null() {
    slice::from_raw_parts(ptr::NonNull::<T>::dangling(), 0)
} else {
    slice::from_raw_parts(data, len)
}
```
which is verbose and easy to miss the need for.

What about the cost of the null pointer check? Well, in many cases the compiler will be able to statically determine that `data` is non-null. In those cases, when it inlines `slice::from_raw_parts` it will optimize out the null check, and the cost will be zero. This includes both invocations in the code above. It even includes complicated cases such as [this](https://docs.rs/bitvec/1.0.1/src/bitvec/vec.rs.html#325-331) in `bitvec`:

```rust
/// Views the bit-vector as a slice of its underlying memory elements.
#[inline]
pub fn as_raw_slice(&self) -> &[T] {
    let (data, len) = (self.bitspan.address().to_const(), self.bitspan.elements());
    unsafe { slice::from_raw_parts(data, len) }
}
```
Here `self.bitspan.address().to_const()` returns the result of `as_ptr()` on a `NonNull` value (via [this code](https://docs.rs/bitvec/1.0.1/src/bitvec/vec.rs.html#325-331) in `wyz` which is `#[inline(always)]`), which allows the null check to be optimized out. Many of the other examples I looked at are like this.

The only cases where *correct* code can incur an additional cost for the null check is when the programmer knows &mdash;and is correct&mdash; that `data` cannot be null, but the compiler does not. And we only care about that in cases where it causes a performance regression. I would suggest that this is probably very rare, and that it is likely to be much more common that programmers take a `(start, len)` or `(start, end)` slice according to the C++ convention and just don't think of the corner case where `start` is null.

What about the possibility of code written for the new specification being run on an older Rust version that requires `data` to be non-null? That's okay as long as we document when the requirement was changed. Then:
* If a programmer checked the previous documentation and wrote previously correct code, their code will still be correct.
* If they check the new documentation, they will see that a null pointer wasn't previously allowed. Either their MSRV is already after the version where it changed; or they bump their MSRV; or they check for a null pointer.
* If they wrote previously incorrect code, the code may work correctly under a later version of Rust. This is fine and what we intend. Eventually they will update the MSRV and the code will be correct.

Aside: [String::from_raw_parts](https://doc.rust-lang.org/std/string/struct.String.html#method.from_raw_parts) and [Vec::from_raw_parts_in](https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#from_raw_parts_in) are similar, but the case for changing those is much weaker, since they impose strong constraints on the allocated region that would be unlikely to be met by a random slice coming from C++ or another language.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hazard in the specification of std::slice::from_raw_parts, since other languages can represent empty slices with a null pointer #120243

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hazard in the specification of std::slice::from_raw_parts, since other languages can represent empty slices with a null pointer #120243

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions