Skip to content

&str and &[u8] have the same layout #1848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sanbox-irl
Copy link

@sanbox-irl sanbox-irl commented Jun 7, 2025

Currently, str and [u8] are promised to have the same layout, but &str and &[u8] are not promised to have the same layout. The std currently assumes that they are promised to have the same layout (https://doc.rust-lang.org/src/core/str/converts.rs.html#172), so this change would have no impact beyond codifying what is already in practice. This PR defines &str and &[u8] to have the same layout, though what that layout is continues to be unspecified.

There are some further steps here that I didn't take:

  1. Every rule about slices should probably also apply to str. I have added str in several places in the reference where it otherwise refered to slices, but likely the definition of a slice should also simply include str. This is a bigger conversation and frankly unimportant if...
  2. Some version of Make str into a libcore struct (redux) rust#107939 ever getts stabilized. In that case, all of this doesn't matter and str would be removed from the reference. This seems to me to be obviously the better choice.

In any case, this PR represents a fairly incrementalist approach.

Thanks for the insight of those on the Zulip thread here

@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Jun 7, 2025
@workingjubilee
Copy link
Member

@rustbot label: +I-lang-nominated +T-lang

@workingjubilee
Copy link
Member

@rustbot label: +I-lang-easy-decision

@rustbot
Copy link
Collaborator

rustbot commented Jun 7, 2025

Unknown labels: I-lang-easy-decision

@chorman0773
Copy link
Contributor

chorman0773 commented Jun 7, 2025

FTR, the standard library has every right to make assumptions about the implementation of the language beyond what the language does guarantees, because it is intrinsically tied to rustc. Not necesssarily a point against making a decision here, but I don't think it's a strong point in favour of stabilizing the equivalence either.

@sanbox-irl
Copy link
Author

FTR, the standard library has every right to make assumptions about the implementation of the language beyond what the language does guarantees, because it is intrinsically tied to rustc. Not necesssarily a point against making a decision here, but I don't think it's a strong point in favour of stabilizing the equivalence either.

Agreed -- I actually am going to reword this to make it clear that I mean this isn't a change for Rustc, only a codification of existing decisions

@@ -110,6 +110,7 @@ r[layout.str]
## `str` Layout

String slices are a UTF-8 representation of characters that have the same layout as slices of type `[u8]`.
A reference `&str` has the same layout as a reference `&[u8]`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement should probably be generalized to be about all primitive pointer types (&, &mut, *const, *mut), not only &.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically unnecessary, yes, but I bet it’ll be confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a link afterwards, such as:

A reference `&str` has the same layout as a reference `&[u8]`.

> [!NOTE]
See [pointer layouts](https://doc.rust-lang.org/reference/type-layout.html#r-layout.pointer.intro) for more information on the layout rules of references in general.

That would point people in the right direction but it wouldn't increase the maintenance burden of the spec or be confusing by its over specification (I would, for example, be confused why the str section is repeating something in the pointer section -- it would make me think that perhaps there's something special about &str's relationship to &mut str and &mut [u8], when that isn't the case at all).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the right thing in general is to have a single formal term which the Reference uses everywhere for “all primitive pointer types” and can link to an appropriate definition, but there isn’t such a term now, so that doesn’t help this PR. I take your point about avoiding repetition, though, so maybe this is a problem to solve entirely separately. Feel free to mark this conversation resolved.

(I say “primitive pointer types” to distinguish from “smart pointer” types which are not guaranteed to have the same layout.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-lang-nominated S-waiting-on-review Status: The marked PR is awaiting review from a maintainer T-lang Team: Lang
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants