From 95c23d0c3bbe6c5c9167e942e580f776d186f07e Mon Sep 17 00:00:00 2001 From: Ashley Mannix Date: Sat, 1 Aug 2020 14:06:05 +1000 Subject: [PATCH 1/9] sketch out an initial leak API --- text/0000-container-leak.md | 250 ++++++++++++++++++++++++++++++++++++ 1 file changed, 250 insertions(+) create mode 100644 text/0000-container-leak.md diff --git a/text/0000-container-leak.md b/text/0000-container-leak.md new file mode 100644 index 00000000000..0611f767f05 --- /dev/null +++ b/text/0000-container-leak.md @@ -0,0 +1,250 @@ +- Feature Name: `container-leak` +- Start Date: 2020-08-01 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Describe a standard set of methods for converting container types like `Box`, `Arc`, `Vec`, `String` between their raw representations. + +For containers with a single value like `Box`, `Arc`, and `Rc`, the following methods should be added to work with their raw representations: + +- `leak`: leak the container and return an arbitrarily long-lived shared or mutable reference to its allocated content. +- `leak_raw`: leak the container and return a `NonNull` pointer to its content. +- `unleak_raw`: take a previously leaked `NonNull` pointer and restore the container from it. +- `into_raw`: leak the container and return a raw pointer to its content. +- `from_raw`: take a previously leaked raw pointer and restore the container from it. + +For growable containers like `Vec` and `String`, the following methods should be added to work with their raw representations: + +- `leak`: shrink the container to its allocated length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content. +- `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. +- `unleak_raw_parts`: take a previously leaked `NonNull` pointer and additional state and restore the container from it. +- `into_raw_parts`: leak the container and return a raw pointer to its content along with any other state that would be needed to restore the container. +- `from_raw_parts`: take a previously leaked raw pointer and additional state and restore the container from it. + +The `leak_raw` and `unleak_raw` methods are "modern" semantic alternatives to the existing `from_raw`/`into_raw` pair of methods on containers that use `NonNull` instead of `*const` or `*mut`. +Users are encouraged to use the `leak_raw`/`unleak_raw` pair over the `from_raw`/`into_raw` pair except for FFI. +With these new methods, the following code: + +```rust +let b: Box = Box::new(t); + +let ptr: *mut T = Box::into_raw(b); + +.. + +let b: Box = unsafe { Box::from_raw(ptr) }; +``` + +can be replaced with: + +```rust +let b: Box = Box::new(t); + +let ptr: NonNull = Box::leak_raw(b); + +.. + +let b: Box = unsafe { Box::unleak_raw(ptr) }; +``` + +# Motivation +[motivation]: #motivation + +Why are we doing this? What use cases does it support? What is the expected outcome? + +The `NonNull` type is a non-nullable pointer type that's variant over `T`. `NonNull` has stronger invariants than `*mut T`, but weaker than the internal `Unique`. Since `Unique` isn't planned to be stabilized, `NonNull` is the most appropriate pointer type for containers like `Box` and `Vec` to use as pointers to their inner value. + +Unfortunately, `NonNull` was stabilized after methods like `Box::into_raw` and `Vec::from_raw_parts`, which are left working with `*mut T`. Now with proposed API addition of `Vec::into_raw_parts` we're left with the conundrum. The options appear to be: + +- break symmetry with `Vec::from_raw_parts` and diverge from `Box::into_raw` by producing a more semantically correct `NonNull`. +- not use a newer and more appropriate type for the purpose it exists for. + +This RFC aims to solve this by specifying any `from_raw`/`into_raw`-like APIs to stay consistent with the precedent set by `Box` and `Vec` of working with raw pointers, and introduce a similar new API for `NonNull` that is also more semantically typed with respect to `T`. Instead of `Vec::leak_raw` returning a `(*mut T, usize)` pair for its allocated storage, it returns a `NonNull<[T]>` instead. + +Keeping the new `leak_raw`/`unleak_raw` API similar to the existing `into_raw`/`from_raw` API is to make them discoverable and avoid new cognitive load for those that are already familiar with `into_raw`/`from_raw`. The semantic names make it clear to a reader what happens to the contents of the container through the conversion into a pointer. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +The `leak_raw` and `unleak_raw` methods can be used to take manual control of the contents of a container like `Box` and restore it later. +It's a fundamental pattern used by specialty datastructures like linked lists to manage non-trivial access and ownership models. +Take the example of `LinkedList`. Internally, it stores `NonNull` pointers to its nodes: + +```rust +pub struct LinkedList { + head: Option>>, + tail: Option>>, + len: usize, + marker: PhantomData>>, +} +``` + +The nodes are allocated using `Box`, where they're then leaked into the linked list, then later unleaked back out: + +```rust +impl LinkedList { + fn push_front_node(&mut self, mut node: Box>) { + unsafe { + node.next = self.head; + node.prev = None; + + // Leak the contents of `node` and return a `NonNull>`. + // It's now the responsibility of `LinkedList` to manage. + let node = Some(Box::leak_raw(node)); + + match self.head { + None => self.tail = node, + Some(head) => (*head.as_ptr()).prev = node, + } + + self.head = node; + self.len += 1; + } + } + + fn pop_front_node(&mut self) -> Option>> { + self.head.map(|node| unsafe { + + // Unleak the contents of `node` and return a `Box>`. + // It's now the responsibility of `Box` to manage. + let node = Box::unleak_raw(node.as_ptr()); + self.head = node.next; + + match self.head { + None => self.tail = None, + Some(head) => (*head.as_ptr()).prev = None, + } + + self.len -= 1; + node + }) + } +} +``` + +The `leak_raw` and `unleak_raw` methods are recommended over `into_raw` and `from_raw` except in special cases like FFI where raw pointers might be explicitly wanted. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +This RFC proposes the following API for single-value containers: + +```rust +impl Box { + pub fn leak<'a>(this: Box) -> &'a mut T where T: 'a; + + pub fn leak_raw(this: Box) -> NonNull; + pub unsafe fn unleak_raw(ptr: NonNull) -> Box; + + pub fn into_raw(this: Box) -> *mut T; + pub unsafe fn from_raw(ptr: *mut T) -> Box; +} + +impl Rc { + pub fn leak<'a>(this: Rc) -> &'a T where T: 'a; + + pub fn leak_raw(this: Rc) -> NonNull; + pub unsafe fn unleak_raw(ptr: NonNull); + + pub fn into_raw(this: Rc) -> *const T; + pub unsafe fn from_raw(ptr: *const T) -> Rc; +} + +impl Arc { + pub fn leak<'a>(this: Arc) -> &'a T where T: 'a; + + pub fn leak_raw(this: Arc) -> NonNull; + pub unsafe fn unleak_raw(ptr: NonNull) -> Arc; + + pub fn into_raw(this: Arc) -> *const T; + pub unsafe fn from_raw(ptr: *const T) -> Arc; +} +``` + +and the following API for growable containers: + +```rust +impl Vec { + pub fn leak<'a>(self) -> &'a mut [T] where T: 'a; + + pub fn leak_raw_parts(self) -> (NonNull<[T]>, usize); + pub fn unleak_raw_parts(ptr: NonNull<[T]>, capacity: usize) -> Vec; + + pub fn into_raw_parts(self) -> (*mut T, usize, usize); + pub fn from_raw_parts(ptr: *mut T, length: usize, capacity: usize) -> Vec; +} + +impl String { + pub fn leak<'a>(self) -> &'a mut str; + + pub fn leak_raw_parts(self) -> (NonNull, usize); + pub fn unleak_raw_parts(ptr: NonNull, capacity: usize) -> String; + + pub fn into_raw_parts(self) -> (*mut u8, usize, usize); + pub fn from_raw_parts(ptr: *mut u8, length: usize, capacity: usize) -> String; +} +``` + +These conversion methods follow the existing semantics of static functions for containers that auto-deref to their inner value like `Box`, and inherent methods for other containers like `Vec`. + +The `NonNull<[T]>` and `NonNull` methods are expected to eventually offer a way to get their length without needing to go through a reference first, but the exact mechanism is left as out-of-scope for this RFC. + +# Drawbacks +[drawbacks]: #drawbacks + +Why should we *not* do this? + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- Why is this design the best in the space of possible designs? +- What other designs have been considered and what is the rationale for not choosing them? +- What is the impact of not doing this? + +# Prior art +[prior-art]: #prior-art + +Discuss prior art, both the good and the bad, in relation to this proposal. +A few examples of what this can include are: + +- For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? +- For community proposals: Is this done by some other community and what were their experiences with it? +- For other teams: What lessons can we learn from what other communities have done here? +- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. + +This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. +If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. + +Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. +Please also take into consideration that rust sometimes intentionally diverges from common language features. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- What parts of the design do you expect to resolve through the RFC process before this gets merged? +- What parts of the design do you expect to resolve through the implementation of this feature before stabilization? +- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? + +# Future possibilities +[future-possibilities]: #future-possibilities + +Think about what the natural extension and evolution of your proposal would +be and how it would affect the language and project as a whole in a holistic +way. Try to use this section as a tool to more fully consider all possible +interactions with the project and language in your proposal. +Also consider how the this all fits into the roadmap for the project +and of the relevant sub-team. + +This is also a good place to "dump ideas", if they are out of scope for the +RFC you are writing but otherwise related. + +If you have tried and cannot think of any future possibilities, +you may simply state that you cannot think of anything. + +Note that having something written down in the future-possibilities section +is not a reason to accept the current or a future RFC; such notes should be +in the section on motivation or rationale in this or subsequent RFCs. +The section merely provides additional information. From 7e1bb16eab6da231e5c5e8bbb8540f0d317ac737 Mon Sep 17 00:00:00 2001 From: Ashley Mannix Date: Thu, 6 Aug 2020 10:05:10 +1000 Subject: [PATCH 2/9] fill in more RFC content --- text/0000-container-leak.md | 162 +++++++++++++++++------------------- 1 file changed, 75 insertions(+), 87 deletions(-) diff --git a/text/0000-container-leak.md b/text/0000-container-leak.md index 0611f767f05..0e8633dfdab 100644 --- a/text/0000-container-leak.md +++ b/text/0000-container-leak.md @@ -6,72 +6,48 @@ # Summary [summary]: #summary -Describe a standard set of methods for converting container types like `Box`, `Arc`, `Vec`, `String` between their raw representations. +Describe a standard set of methods for converting container types like `Box`, `Arc`, `Vec`, `String` to and from raw pointers. -For containers with a single value like `Box`, `Arc`, and `Rc`, the following methods should be added to work with their raw representations: +For containers with a single value like `Box`, `Arc`, and `Rc`, any subset of the following method pairs should be added to work with their raw representations: - `leak`: leak the container and return an arbitrarily long-lived shared or mutable reference to its allocated content. -- `leak_raw`: leak the container and return a `NonNull` pointer to its content. -- `unleak_raw`: take a previously leaked `NonNull` pointer and restore the container from it. +- `leak_raw`: leak the container and return a `NonNull` pointer to its content. +- `unleak_raw`: take a previously leaked `NonNull` pointer and restore the container from it. - `into_raw`: leak the container and return a raw pointer to its content. - `from_raw`: take a previously leaked raw pointer and restore the container from it. -For growable containers like `Vec` and `String`, the following methods should be added to work with their raw representations: +For growable containers like `Vec` and `String`, any subset of the following method pairs should be added to work with their raw representations: - `leak`: shrink the container to its allocated length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content. -- `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. -- `unleak_raw_parts`: take a previously leaked `NonNull` pointer and additional state and restore the container from it. +- `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. +- `unleak_raw_parts`: take a previously leaked `NonNull` pointer and additional state and restore the container from it. - `into_raw_parts`: leak the container and return a raw pointer to its content along with any other state that would be needed to restore the container. - `from_raw_parts`: take a previously leaked raw pointer and additional state and restore the container from it. -The `leak_raw` and `unleak_raw` methods are "modern" semantic alternatives to the existing `from_raw`/`into_raw` pair of methods on containers that use `NonNull` instead of `*const` or `*mut`. -Users are encouraged to use the `leak_raw`/`unleak_raw` pair over the `from_raw`/`into_raw` pair except for FFI. -With these new methods, the following code: - -```rust -let b: Box = Box::new(t); - -let ptr: *mut T = Box::into_raw(b); - -.. - -let b: Box = unsafe { Box::from_raw(ptr) }; -``` - -can be replaced with: - -```rust -let b: Box = Box::new(t); - -let ptr: NonNull = Box::leak_raw(b); - -.. - -let b: Box = unsafe { Box::unleak_raw(ptr) }; -``` +The `leak_raw`/`unleak_raw` methods are "modern" semantic alternatives to the existing `into_raw`/`from_raw` pair of methods on containers that use `NonNull` as the pointer type instead of `*const T` or `*mut T`. +Users are encouraged to prefer the `leak_raw`/`unleak_raw` methods over `into_raw`/`from_raw` except for FFI or other niche cases. # Motivation [motivation]: #motivation -Why are we doing this? What use cases does it support? What is the expected outcome? - The `NonNull` type is a non-nullable pointer type that's variant over `T`. `NonNull` has stronger invariants than `*mut T`, but weaker than the internal `Unique`. Since `Unique` isn't planned to be stabilized, `NonNull` is the most appropriate pointer type for containers like `Box` and `Vec` to use as pointers to their inner value. -Unfortunately, `NonNull` was stabilized after methods like `Box::into_raw` and `Vec::from_raw_parts`, which are left working with `*mut T`. Now with proposed API addition of `Vec::into_raw_parts` we're left with the conundrum. The options appear to be: +Unfortunately, `NonNull` was stabilized after methods like `Box::into_raw` and `Vec::from_raw_parts`, which are left working with `*mut T`. Now with the proposed API addition of `Vec::into_raw_parts` we're left with a conundrum. The options appear to be: -- break symmetry with `Vec::from_raw_parts` and diverge from `Box::into_raw` by producing a more semantically correct `NonNull`. -- not use a newer and more appropriate type for the purpose it exists for. +- break symmetry with `Vec::from_raw_parts` and diverge from `Box::into_raw` by producing a more semantically accurate `NonNull`. +- not use a newer and more appropriate type for the purpose it exists for and leave it up to users to convert. -This RFC aims to solve this by specifying any `from_raw`/`into_raw`-like APIs to stay consistent with the precedent set by `Box` and `Vec` of working with raw pointers, and introduce a similar new API for `NonNull` that is also more semantically typed with respect to `T`. Instead of `Vec::leak_raw` returning a `(*mut T, usize)` pair for its allocated storage, it returns a `NonNull<[T]>` instead. +This RFC aims to answer this question by specifying any `into_raw`/`from_raw`-like APIs to stay consistent with the precedent set by `Box` and `Vec` of working with `*const T` and `*mut T`, and introduce a similar new API for `NonNull` that is also more semantically typed with respect to `T`. Instead of `Vec::leak_raw` returning a `(*mut T, usize)` pair for its allocated storage, it returns a `NonNull<[T]>` instead. Keeping the new `leak_raw`/`unleak_raw` API similar to the existing `into_raw`/`from_raw` API is to make them discoverable and avoid new cognitive load for those that are already familiar with `into_raw`/`from_raw`. The semantic names make it clear to a reader what happens to the contents of the container through the conversion into a pointer. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -The `leak_raw` and `unleak_raw` methods can be used to take manual control of the contents of a container like `Box` and restore it later. -It's a fundamental pattern used by specialty datastructures like linked lists to manage non-trivial access and ownership models. -Take the example of `LinkedList`. Internally, it stores `NonNull` pointers to its nodes: +The `leak_raw` method can be used to take manual control of the lifetime and access to the contents of a container like `Box`. +The `unleak_raw` method can then be used to later restore the container from its leaked pointer. +It's a fundamental pattern used by specialty data-structures like linked lists to manage non-trivial access and ownership models. +Take the example of `LinkedList`. Internally, it stores `NonNull` pointers to its nodes: ```rust pub struct LinkedList { @@ -82,7 +58,7 @@ pub struct LinkedList { } ``` -The nodes are allocated using `Box`, where they're then leaked into the linked list, then later unleaked back out: +The nodes are allocated using `Box`, where they're then leaked into the linked list, then later unleaked back out: ```rust impl LinkedList { @@ -125,46 +101,75 @@ impl LinkedList { } ``` -The `leak_raw` and `unleak_raw` methods are recommended over `into_raw` and `from_raw` except in special cases like FFI where raw pointers might be explicitly wanted. +The `String::leak_raw` method is a nice representative of the new API for multi-value containers because it produces a more semantic fat-pointer to the string's contents. +Instead of a `(*mut u8, usize)` pair, it returns a `NonNull`, which encodes its length and retains the UTF8 invariant together. +Working with the underlying string is just a matter of dereferencing it, instead of having to reconstruct it through `slice::from_raw_parts` and then `str::from_utf8_unchecked`. + +The `leak_raw` and `unleak_raw` methods are recommended over `into_raw` and `from_raw` except in special cases like FFI where `*const T` or `*mut T` might be explicitly wanted. With these new methods, the following existing code: + +```rust +let b: Box = Box::new(t); + +let ptr: *mut T = Box::into_raw(b); + +.. + +let b: Box = unsafe { Box::from_raw(ptr) }; +``` + +can be replaced with: + +```rust +let b: Box = Box::new(t); + +let ptr: NonNull = Box::leak_raw(b); + +.. + +let b: Box = unsafe { Box::unleak_raw(ptr) }; +``` # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -This RFC proposes the following API for single-value containers: +This RFC proposes the following API for single-value containers (some of these methods are already stable or implemented but unstable): ```rust impl Box { + // Already stable pub fn leak<'a>(this: Box) -> &'a mut T where T: 'a; pub fn leak_raw(this: Box) -> NonNull; pub unsafe fn unleak_raw(ptr: NonNull) -> Box; + // Already stable pub fn into_raw(this: Box) -> *mut T; + // Already stable pub unsafe fn from_raw(ptr: *mut T) -> Box; } impl Rc { - pub fn leak<'a>(this: Rc) -> &'a T where T: 'a; - pub fn leak_raw(this: Rc) -> NonNull; pub unsafe fn unleak_raw(ptr: NonNull); + // Already stable pub fn into_raw(this: Rc) -> *const T; + // Already stable pub unsafe fn from_raw(ptr: *const T) -> Rc; } impl Arc { - pub fn leak<'a>(this: Arc) -> &'a T where T: 'a; - pub fn leak_raw(this: Arc) -> NonNull; pub unsafe fn unleak_raw(ptr: NonNull) -> Arc; + // Already stable pub fn into_raw(this: Arc) -> *const T; + // Already stable pub unsafe fn from_raw(ptr: *const T) -> Arc; } ``` -and the following API for growable containers: +and the following API for growable containers (some of these methods are already stable or implemented but unstable): ```rust impl Vec { @@ -173,7 +178,9 @@ impl Vec { pub fn leak_raw_parts(self) -> (NonNull<[T]>, usize); pub fn unleak_raw_parts(ptr: NonNull<[T]>, capacity: usize) -> Vec; + // Unstable, tracked by: https://github.com/rust-lang/rust/issues/65816 pub fn into_raw_parts(self) -> (*mut T, usize, usize); + // Already stable pub fn from_raw_parts(ptr: *mut T, length: usize, capacity: usize) -> Vec; } @@ -183,68 +190,49 @@ impl String { pub fn leak_raw_parts(self) -> (NonNull, usize); pub fn unleak_raw_parts(ptr: NonNull, capacity: usize) -> String; + // Unstable, tracked by: https://github.com/rust-lang/rust/issues/65816 pub fn into_raw_parts(self) -> (*mut u8, usize, usize); + // Already stable pub fn from_raw_parts(ptr: *mut u8, length: usize, capacity: usize) -> String; } ``` -These conversion methods follow the existing semantics of static functions for containers that auto-deref to their inner value like `Box`, and inherent methods for other containers like `Vec`. +These conversion methods follow the existing semantics of static functions for containers that dereference to their inner value like `Box`, and inherent methods for other containers like `Vec`. The `NonNull<[T]>` and `NonNull` methods are expected to eventually offer a way to get their length without needing to go through a reference first, but the exact mechanism is left as out-of-scope for this RFC. # Drawbacks [drawbacks]: #drawbacks -Why should we *not* do this? +A drawback of this approach is that it creates a standard that any future containers are expected to adhere to. +It creates more API surface area that needs to be rationalized with future idioms, just like this RFC is attempting to do for `into_raw`/`from_raw` with `NonNull`. +As an example, if a future Rust stabilizes another even more appropriate pointer type then it would need to be fit into this scheme. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -- Why is this design the best in the space of possible designs? -- What other designs have been considered and what is the rationale for not choosing them? -- What is the impact of not doing this? +An alternative is to just start using `NonNull` going forward and accept the inconsistency with existing methods. +This isn't preferable to keeping new `into_raw`/`from_raw` pairs consistent with the ones that already exist because it forces users to learn the return values for all of these methods by rote instead of being able to rely on simple conventions. + +Another is to just use `leak` methods and the conversion from `&T` and `&mut T` into `NonNull` to work with. +This isn't preferable to method pairs that return a `NonNull` and look similar to `into_raw`/`from_raw` because they're less discoverable while still being preferable, and require more steps to leak and unleak than would otherwise be needed. # Prior art [prior-art]: #prior-art -Discuss prior art, both the good and the bad, in relation to this proposal. -A few examples of what this can include are: - -- For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? -- For community proposals: Is this done by some other community and what were their experiences with it? -- For other teams: What lessons can we learn from what other communities have done here? -- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. - -This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. -If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. - -Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. -Please also take into consideration that rust sometimes intentionally diverges from common language features. +The prior art is `Box`, which already has the `leak`, `into_raw` and `from_raw` methods. +It also has unstable `into_raw_non_null`, but is deprecated in favor of `NonNull::from(Box::leak(b))`. +This current workaround is the second alternative listed above, that isn't considered preferable to `Box::leak_raw(b)`. # Unresolved questions [unresolved-questions]: #unresolved-questions -- What parts of the design do you expect to resolve through the RFC process before this gets merged? -- What parts of the design do you expect to resolve through the implementation of this feature before stabilization? -- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? +This RFC doesn't propose a `leak` method for `Rc` or `Arc` but they could be added after working through the motivations. + +Do we expect `Box::unleak_raw(NonNull::from(Box::leak(b)))` to work? # Future possibilities [future-possibilities]: #future-possibilities -Think about what the natural extension and evolution of your proposal would -be and how it would affect the language and project as a whole in a holistic -way. Try to use this section as a tool to more fully consider all possible -interactions with the project and language in your proposal. -Also consider how the this all fits into the roadmap for the project -and of the relevant sub-team. - -This is also a good place to "dump ideas", if they are out of scope for the -RFC you are writing but otherwise related. - -If you have tried and cannot think of any future possibilities, -you may simply state that you cannot think of anything. - -Note that having something written down in the future-possibilities section -is not a reason to accept the current or a future RFC; such notes should be -in the section on motivation or rationale in this or subsequent RFCs. -The section merely provides additional information. +There are other types that should probably be included, like `OsString` and `PathBuf`. +Using `NonNull<[T]` and `NonNull` sets an expectation that `NonNull` will have some APIs for working with these fat-pointer types. From 7dfc2c819a9ea2615a21629f8b6ad2ea560400c9 Mon Sep 17 00:00:00 2001 From: Ashley Mannix Date: Thu, 6 Aug 2020 11:36:46 +1000 Subject: [PATCH 3/9] add deprecation as an alternative --- text/0000-container-leak.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/text/0000-container-leak.md b/text/0000-container-leak.md index 0e8633dfdab..c3c0d89b255 100644 --- a/text/0000-container-leak.md +++ b/text/0000-container-leak.md @@ -11,7 +11,7 @@ Describe a standard set of methods for converting container types like `Box`, For containers with a single value like `Box`, `Arc`, and `Rc`, any subset of the following method pairs should be added to work with their raw representations: - `leak`: leak the container and return an arbitrarily long-lived shared or mutable reference to its allocated content. -- `leak_raw`: leak the container and return a `NonNull` pointer to its content. +- `leak_raw`: leak the container and return a `NonNull` pointer to its content. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is the same as `Self::leak_raw(value)`. - `unleak_raw`: take a previously leaked `NonNull` pointer and restore the container from it. - `into_raw`: leak the container and return a raw pointer to its content. - `from_raw`: take a previously leaked raw pointer and restore the container from it. @@ -19,7 +19,7 @@ For containers with a single value like `Box`, `Arc`, and `Rc`, any sub For growable containers like `Vec` and `String`, any subset of the following method pairs should be added to work with their raw representations: - `leak`: shrink the container to its allocated length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content. -- `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. +- `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is the same as `self.leak_raw()`. - `unleak_raw_parts`: take a previously leaked `NonNull` pointer and additional state and restore the container from it. - `into_raw_parts`: leak the container and return a raw pointer to its content along with any other state that would be needed to restore the container. - `from_raw_parts`: take a previously leaked raw pointer and additional state and restore the container from it. @@ -217,6 +217,8 @@ This isn't preferable to keeping new `into_raw`/`from_raw` pairs consistent with Another is to just use `leak` methods and the conversion from `&T` and `&mut T` into `NonNull` to work with. This isn't preferable to method pairs that return a `NonNull` and look similar to `into_raw`/`from_raw` because they're less discoverable while still being preferable, and require more steps to leak and unleak than would otherwise be needed. +Another is to deprecate `into_raw`/`from_raw` in favor of `leak_raw().as_ptr()` and `NonNull::new_unchecked(ptr)`. This makes it easier to discover the preferred API for working with raw container contents and the expense of more machinery in FFI use-cases. + # Prior art [prior-art]: #prior-art From 46bb6ef49894e7b7667260e86a8eddff17785de1 Mon Sep 17 00:00:00 2001 From: Ashley Mannix Date: Sun, 16 Aug 2020 15:07:45 +1000 Subject: [PATCH 4/9] Provide examples of when to use leak_raw vs into_raw --- text/0000-container-leak.md | 178 +++++++++++++++++++++++++++++++----- 1 file changed, 153 insertions(+), 25 deletions(-) diff --git a/text/0000-container-leak.md b/text/0000-container-leak.md index c3c0d89b255..ecec95ea794 100644 --- a/text/0000-container-leak.md +++ b/text/0000-container-leak.md @@ -11,7 +11,7 @@ Describe a standard set of methods for converting container types like `Box`, For containers with a single value like `Box`, `Arc`, and `Rc`, any subset of the following method pairs should be added to work with their raw representations: - `leak`: leak the container and return an arbitrarily long-lived shared or mutable reference to its allocated content. -- `leak_raw`: leak the container and return a `NonNull` pointer to its content. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is the same as `Self::leak_raw(value)`. +- `leak_raw`: leak the container and return a `NonNull` pointer to its content. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is the same as `NonNull::from(Self::leak())` and `Self::leak_raw(value)`. - `unleak_raw`: take a previously leaked `NonNull` pointer and restore the container from it. - `into_raw`: leak the container and return a raw pointer to its content. - `from_raw`: take a previously leaked raw pointer and restore the container from it. @@ -19,35 +19,43 @@ For containers with a single value like `Box`, `Arc`, and `Rc`, any sub For growable containers like `Vec` and `String`, any subset of the following method pairs should be added to work with their raw representations: - `leak`: shrink the container to its allocated length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content. -- `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is the same as `self.leak_raw()`. +- `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is the same as `NonNull::from(self.leak_raw_parts().0)` and `NonNull::from(self.leak())`. - `unleak_raw_parts`: take a previously leaked `NonNull` pointer and additional state and restore the container from it. - `into_raw_parts`: leak the container and return a raw pointer to its content along with any other state that would be needed to restore the container. - `from_raw_parts`: take a previously leaked raw pointer and additional state and restore the container from it. The `leak_raw`/`unleak_raw` methods are "modern" semantic alternatives to the existing `into_raw`/`from_raw` pair of methods on containers that use `NonNull` as the pointer type instead of `*const T` or `*mut T`. -Users are encouraged to prefer the `leak_raw`/`unleak_raw` methods over `into_raw`/`from_raw` except for FFI or other niche cases. +Users are encouraged to prefer the `leak_raw`/`unleak_raw` methods over `into_raw`/`from_raw` except for the important case where they need FFI-safety. # Motivation [motivation]: #motivation -The `NonNull` type is a non-nullable pointer type that's variant over `T`. `NonNull` has stronger invariants than `*mut T`, but weaker than the internal `Unique`. Since `Unique` isn't planned to be stabilized, `NonNull` is the most appropriate pointer type for containers like `Box` and `Vec` to use as pointers to their inner value. +The `NonNull` type is a non-nullable pointer type that's variant over `T`. `NonNull` has stronger invariants than `*mut T`, but weaker than the internal `Unique`. +Since `Unique` isn't planned to be stabilized, `NonNull` is the most appropriate pointer type for containers like `Box` and `Vec` to use as pointers to their inner value. -Unfortunately, `NonNull` was stabilized after methods like `Box::into_raw` and `Vec::from_raw_parts`, which are left working with `*mut T`. Now with the proposed API addition of `Vec::into_raw_parts` we're left with a conundrum. The options appear to be: +Unfortunately, `NonNull` was stabilized after methods like `Box::into_raw` and `Vec::from_raw_parts`, which are left working with `*mut T`. +Now with the proposed API addition of `Vec::into_raw_parts` we're left with a conundrum. The options appear to be: - break symmetry with `Vec::from_raw_parts` and diverge from `Box::into_raw` by producing a more semantically accurate `NonNull`. - not use a newer and more appropriate type for the purpose it exists for and leave it up to users to convert. -This RFC aims to answer this question by specifying any `into_raw`/`from_raw`-like APIs to stay consistent with the precedent set by `Box` and `Vec` of working with `*const T` and `*mut T`, and introduce a similar new API for `NonNull` that is also more semantically typed with respect to `T`. Instead of `Vec::leak_raw` returning a `(*mut T, usize)` pair for its allocated storage, it returns a `NonNull<[T]>` instead. +This RFC aims to answer this question by specifying any `into_raw`/`from_raw`-like APIs to stay consistent with the precedent set by `Box` and `Vec` of working with `*const T` and `*mut T`, and introduce a similar new API for `NonNull` that is also more semantically typed with respect to `T`. +Instead of `Vec::leak_raw` returning a `(*mut T, usize)` pair for its allocated storage, it returns a `NonNull<[T]>` instead. -Keeping the new `leak_raw`/`unleak_raw` API similar to the existing `into_raw`/`from_raw` API is to make them discoverable and avoid new cognitive load for those that are already familiar with `into_raw`/`from_raw`. The semantic names make it clear to a reader what happens to the contents of the container through the conversion into a pointer. +Keeping the new `leak_raw`/`unleak_raw` API similar to the existing `into_raw`/`from_raw` API is to make them discoverable and avoid new cognitive load for those that are already familiar with `into_raw`/`from_raw`. +The semantic names make it clear to a reader what happens to the contents of the container through the conversion into a pointer. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation +## When do I use `leak_raw`/`unleak_raw`? + +The `leak_raw`/`unleak_raw` and `leak_raw_parts`/`unleak_raw_parts` methods are good for pure Rust datastructures that would probably use references if it was possible to describe their non-trivial access and ownership requirements through them. + The `leak_raw` method can be used to take manual control of the lifetime and access to the contents of a container like `Box`. The `unleak_raw` method can then be used to later restore the container from its leaked pointer. -It's a fundamental pattern used by specialty data-structures like linked lists to manage non-trivial access and ownership models. -Take the example of `LinkedList`. Internally, it stores `NonNull` pointers to its nodes: + +Take the example of `LinkedList` from the standard library. Internally, it stores `NonNull` pointers to its nodes: ```rust pub struct LinkedList { @@ -58,7 +66,8 @@ pub struct LinkedList { } ``` -The nodes are allocated using `Box`, where they're then leaked into the linked list, then later unleaked back out: +The nodes are allocated using `Box`, where they're then leaked into the linked list, then later unleaked back out. +This can be done using `leak_raw`/`unleak_raw`: ```rust impl LinkedList { @@ -101,34 +110,150 @@ impl LinkedList { } ``` -The `String::leak_raw` method is a nice representative of the new API for multi-value containers because it produces a more semantic fat-pointer to the string's contents. -Instead of a `(*mut u8, usize)` pair, it returns a `NonNull`, which encodes its length and retains the UTF8 invariant together. -Working with the underlying string is just a matter of dereferencing it, instead of having to reconstruct it through `slice::from_raw_parts` and then `str::from_utf8_unchecked`. +The `leak_raw_parts` method is the equivalent of `leak_raw` for multi-value containers like `String` that return extra data beyond the pointer needed to reconstruct the container later. +The `unleak_raw_parts` method is the equivalent of `unleak_raw`. -The `leak_raw` and `unleak_raw` methods are recommended over `into_raw` and `from_raw` except in special cases like FFI where `*const T` or `*mut T` might be explicitly wanted. With these new methods, the following existing code: +The `String::leak_raw_parts` method is a nice representative of the new API for multi-value containers because it produces a more semantic fat-pointer to the string's contents. +Instead of a `(*mut u8, usize)` pair for the pointer and length, it returns a `NonNull`, which encodes its length and retains the UTF8 invariant together. ```rust -let b: Box = Box::new(t); +let string = String::from("🗻∈🌏"); -let ptr: *mut T = Box::into_raw(b); +let (ptr, cap): (NonNull, usize) = string.leak_raw_parts(); -.. +// The `ptr` is just a single `as_ref` call away from a `&str` +assert_eq!(Some("🗻"), unsafe { ptr.as_ref().get(0..4) }); -let b: Box = unsafe { Box::from_raw(ptr) }; +let string = String::unleak_raw_parts(ptr, cap); ``` -can be replaced with: +Using the `into_raw`/`from_raw` API, the above example would require more machinery to re-assert the contents are valid UTF8 before returning a `&str`: ```rust -let b: Box = Box::new(t); +let string = String::from("🗻∈🌏"); -let ptr: NonNull = Box::leak_raw(b); +let (ptr, len, cap): (*mut u8, usize, usize) = string.into_raw_parts(); -.. +// The `ptr` needs to be converted back into a `str` through a slice first +assert_eq!(Some("🗻"), unsafe { str::from_utf8_unchecked(slice::from_raw_parts(ptr, len)).get(0..4) }); -let b: Box = unsafe { Box::unleak_raw(ptr) }; +let string = String::from_raw_parts(ptr, len, cap); ``` +## When do I use `into_raw`/`from_raw`? + +The `into_raw`/`from_raw` and `into_raw_parts`/`from_raw_parts` methods are good for FFI where a Rust type needs to be used by non-Rust code. + +The `*mut T`, `*const T`, and `usize` types returned by these methods typically have a direct counterpart in the target language, so they don't require learning new concepts for users that are familiar with raw pointers. + +As an example, it's common to share complex Rust values opaquely by boxing them and passing raw pointers to-and-fro. +Take this example [from The Rust FFI Guide][ffi-guide] that wraps a web request: + +```rust +#[no_mangle] +pub unsafe extern "C" fn request_create(url: *const c_char) -> *mut Request { + if url.is_null() { + return ptr::null_mut(); + } + + let raw = CStr::from_ptr(url); + + let url_as_str = match raw.to_str() { + Ok(s) => s, + Err(_) => return ptr::null_mut(), + }; + + let parsed_url = match Url::parse(url_as_str) { + Ok(u) => u, + Err(_) => return ptr::null_mut(), + }; + + let req = Request::new(parsed_url, Method::Get); + + // Get a stable address for the request + Box::into_raw(Box::new(req)) +} + +#[no_mangle] +pub unsafe extern "C" fn request_destroy(req: *mut Request) { + if !req.is_null() { + // Reinterpret the stable address as a previously allocated box + drop(Box::from_raw(req)); + } +} +``` + +In this example, a reader only needs to consider one kind of pointer type (technically `*const T` and `*mut T` are different types, but one could read them like `T*` from other languages with a sharing annotation). +This API could use `Option>` instead of `*mut Request` to force null checking in `request_destroy`, but that requires the author to juggle more concepts to write. +They'd need to understand that while `NonNull` has the same representation as `*const T`, it has the same semantics as `Option>`. + +The `into_raw_parts` method is the equivalent of `into_raw` for multi-value containers like `Vec` that split the fat pointer into its FFI-safe parts. +The `from_raw_parts` method is the equivalent of `from_raw`. + +An FFI over `Vec` is a nice example of when `into_raw_parts` can be helpful over `leak_raw_parts`. +`Vec::leak_raw_parts` returns a fat `NonNull<[u8]>` pointer, but `NonNull<[u8]>` (and consequently `*const [u8]`) is not considered FFI-safe. +Instead, we can use `Vec::into_raw_parts`, which only uses FFI-safe `*mut u8` and `usize` types: + +```rust +#[repr(C)] +pub struct RawVec { + ptr: *mut u8, + len: usize, + cap: usize +} + +#[no_mangle] +pub unsafe extern "C" fn vec_create() -> RawVec { + let v = vec![0u8; 512]; + + // Get the pointer to the first element, length and capacity for the buffer + let (ptr, len, cap) = v.into_raw_parts(); + + RawVec { ptr, len, cap } +} + +#[no_mangle] +pub unsafe extern "C" fn vec_destroy(vec: RawVec) { + if !vec.ptr.is_null() { + // Rebuild and drop the previously allocated buffer + drop(Vec::from_raw_parts(vec.ptr, vec.len, vec.cap)); + } +} +``` + +Using the `leak_raw_parts`/`unleak_raw_parts` API, the above example would require more machinery to convert the `NonNull<[u8]>` into FFI-safe types: + +```rust +#[repr(C)] +pub struct RawVec { + ptr: *mut u8, + len: usize, + cap: usize +} + +#[no_mangle] +pub unsafe extern "C" fn vec_create() -> RawVec { + let v = vec![0u8; 512]; + + let (ptr, cap) = v.leak_raw_parts(); + + // Cast the pointer to the slice to its first element and get the length + let (ptr, len) = (ptr.cast::().as_ptr(), ptr.len()); + + RawVec { ptr, len, cap } +} + +#[no_mangle] +pub unsafe extern "C" fn vec_destroy(vec: RawVec) { + if !vec.ptr.is_null() { + // Rebuild the `NonNull<[u8]>` through a `NonNull` from the `*mut u8` and length + drop(Vec::unleak_raw_parts(NonNull::slice_from_raw_parts(NonNull::new_unchecked(vec.ptr), vec.len), vec.cap)); + } +} +``` + +[ffi-guide]: https://michael-f-bryan.github.io/rust-ffi-guide/basic_request.html#creating-the-c-interface + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -169,7 +294,7 @@ impl Arc { } ``` -and the following API for growable containers (some of these methods are already stable or implemented but unstable): +and the following API for multi-value containers (some of these methods are already stable or implemented but unstable): ```rust impl Vec { @@ -199,6 +324,8 @@ impl String { These conversion methods follow the existing semantics of static functions for containers that dereference to their inner value like `Box`, and inherent methods for other containers like `Vec`. +The docs for `into_raw`/`from_raw` methods will point users to `leak_raw`/`unleak_raw` unless they need FFI-safety. + The `NonNull<[T]>` and `NonNull` methods are expected to eventually offer a way to get their length without needing to go through a reference first, but the exact mechanism is left as out-of-scope for this RFC. # Drawbacks @@ -217,7 +344,8 @@ This isn't preferable to keeping new `into_raw`/`from_raw` pairs consistent with Another is to just use `leak` methods and the conversion from `&T` and `&mut T` into `NonNull` to work with. This isn't preferable to method pairs that return a `NonNull` and look similar to `into_raw`/`from_raw` because they're less discoverable while still being preferable, and require more steps to leak and unleak than would otherwise be needed. -Another is to deprecate `into_raw`/`from_raw` in favor of `leak_raw().as_ptr()` and `NonNull::new_unchecked(ptr)`. This makes it easier to discover the preferred API for working with raw container contents and the expense of more machinery in FFI use-cases. +Another is to deprecate `into_raw`/`from_raw` in favor of `leak_raw().as_ptr()` and `NonNull::new_unchecked(ptr)`. +This makes it easier to discover the preferred API for working with raw container contents and the expense of more machinery in FFI use-cases. # Prior art [prior-art]: #prior-art From f38b01b6dd1e079bead894853620452f7e79c341 Mon Sep 17 00:00:00 2001 From: Ashley Mannix Date: Sun, 16 Aug 2020 15:11:35 +1000 Subject: [PATCH 5/9] Update 0000-container-leak.md --- text/0000-container-leak.md | 66 ++++++++----------------------------- 1 file changed, 14 insertions(+), 52 deletions(-) diff --git a/text/0000-container-leak.md b/text/0000-container-leak.md index ecec95ea794..0e2af47b936 100644 --- a/text/0000-container-leak.md +++ b/text/0000-container-leak.md @@ -116,29 +116,19 @@ The `unleak_raw_parts` method is the equivalent of `unleak_raw`. The `String::leak_raw_parts` method is a nice representative of the new API for multi-value containers because it produces a more semantic fat-pointer to the string's contents. Instead of a `(*mut u8, usize)` pair for the pointer and length, it returns a `NonNull`, which encodes its length and retains the UTF8 invariant together. -```rust +```diff let string = String::from("🗻∈🌏"); -let (ptr, cap): (NonNull, usize) = string.leak_raw_parts(); ++ let (ptr, cap): (NonNull, usize) = string.leak_raw_parts(); +- let (ptr, len, cap): (*mut u8, usize, usize) = string.into_raw_parts(); -// The `ptr` is just a single `as_ref` call away from a `&str` -assert_eq!(Some("🗻"), unsafe { ptr.as_ref().get(0..4) }); ++ assert_eq!(Some("🗻"), unsafe { ptr.as_ref().get(0..4) }); +- assert_eq!(Some("🗻"), unsafe { str::from_utf8_unchecked(slice::from_raw_parts(ptr, len)).get(0..4) }); let string = String::unleak_raw_parts(ptr, cap); ``` -Using the `into_raw`/`from_raw` API, the above example would require more machinery to re-assert the contents are valid UTF8 before returning a `&str`: - -```rust -let string = String::from("🗻∈🌏"); - -let (ptr, len, cap): (*mut u8, usize, usize) = string.into_raw_parts(); - -// The `ptr` needs to be converted back into a `str` through a slice first -assert_eq!(Some("🗻"), unsafe { str::from_utf8_unchecked(slice::from_raw_parts(ptr, len)).get(0..4) }); - -let string = String::from_raw_parts(ptr, len, cap); -``` +Using the `into_raw`/`from_raw` API, the above example would require more machinery to re-assert the contents are valid UTF8 before returning a `&str`. ## When do I use `into_raw`/`from_raw`? @@ -194,7 +184,7 @@ An FFI over `Vec` is a nice example of when `into_raw_parts` can be helpful `Vec::leak_raw_parts` returns a fat `NonNull<[u8]>` pointer, but `NonNull<[u8]>` (and consequently `*const [u8]`) is not considered FFI-safe. Instead, we can use `Vec::into_raw_parts`, which only uses FFI-safe `*mut u8` and `usize` types: -```rust +```diff #[repr(C)] pub struct RawVec { ptr: *mut u8, @@ -206,52 +196,24 @@ pub struct RawVec { pub unsafe extern "C" fn vec_create() -> RawVec { let v = vec![0u8; 512]; - // Get the pointer to the first element, length and capacity for the buffer - let (ptr, len, cap) = v.into_raw_parts(); - - RawVec { ptr, len, cap } -} ++ let (ptr, len, cap) = v.into_raw_parts(); +- let (ptr, cap) = v.leak_raw_parts(); +- let (ptr, len) = (ptr.cast::().as_ptr(), ptr.len()); -#[no_mangle] -pub unsafe extern "C" fn vec_destroy(vec: RawVec) { - if !vec.ptr.is_null() { - // Rebuild and drop the previously allocated buffer - drop(Vec::from_raw_parts(vec.ptr, vec.len, vec.cap)); - } -} -``` - -Using the `leak_raw_parts`/`unleak_raw_parts` API, the above example would require more machinery to convert the `NonNull<[u8]>` into FFI-safe types: - -```rust -#[repr(C)] -pub struct RawVec { - ptr: *mut u8, - len: usize, - cap: usize -} - -#[no_mangle] -pub unsafe extern "C" fn vec_create() -> RawVec { - let v = vec![0u8; 512]; - - let (ptr, cap) = v.leak_raw_parts(); - - // Cast the pointer to the slice to its first element and get the length - let (ptr, len) = (ptr.cast::().as_ptr(), ptr.len()); - RawVec { ptr, len, cap } } #[no_mangle] pub unsafe extern "C" fn vec_destroy(vec: RawVec) { if !vec.ptr.is_null() { - // Rebuild the `NonNull<[u8]>` through a `NonNull` from the `*mut u8` and length - drop(Vec::unleak_raw_parts(NonNull::slice_from_raw_parts(NonNull::new_unchecked(vec.ptr), vec.len), vec.cap)); ++ drop(Vec::from_raw_parts(vec.ptr, vec.len, vec.cap)); +- drop(Vec::unleak_raw_parts(NonNull::slice_from_raw_parts(NonNull::new_unchecked(vec.ptr), vec.len), vec.cap)); } } ``` +Using the `leak_raw_parts`/`unleak_raw_parts` API, the above example would require more machinery to convert the `NonNull<[u8]>` into FFI-safe types. + [ffi-guide]: https://michael-f-bryan.github.io/rust-ffi-guide/basic_request.html#creating-the-c-interface # Reference-level explanation From 562a39262df5ec35bc71fd9b83d0e1010833da48 Mon Sep 17 00:00:00 2001 From: Ashley Mannix Date: Sun, 16 Aug 2020 22:07:58 +1000 Subject: [PATCH 6/9] Update 0000-container-leak.md --- text/0000-container-leak.md | 37 +++++++++++++++++++++---------------- 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/text/0000-container-leak.md b/text/0000-container-leak.md index 0e2af47b936..abf5b59b3c4 100644 --- a/text/0000-container-leak.md +++ b/text/0000-container-leak.md @@ -11,15 +11,15 @@ Describe a standard set of methods for converting container types like `Box`, For containers with a single value like `Box`, `Arc`, and `Rc`, any subset of the following method pairs should be added to work with their raw representations: - `leak`: leak the container and return an arbitrarily long-lived shared or mutable reference to its allocated content. -- `leak_raw`: leak the container and return a `NonNull` pointer to its content. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is the same as `NonNull::from(Self::leak())` and `Self::leak_raw(value)`. +- `leak_raw`: leak the container and return a `NonNull` pointer to its content. The type `T` is the same as `Deref::Target`, so `Self::leak_raw(value)` is equivalent to `NonNull::from(&*self)` and `NonNull::from(Self::leak(value))`. - `unleak_raw`: take a previously leaked `NonNull` pointer and restore the container from it. - `into_raw`: leak the container and return a raw pointer to its content. - `from_raw`: take a previously leaked raw pointer and restore the container from it. -For growable containers like `Vec` and `String`, any subset of the following method pairs should be added to work with their raw representations: +For multi-value containers like `Vec` and `String`, any subset of the following method pairs should be added to work with their raw representations: - `leak`: shrink the container to its allocated length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content. -- `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is the same as `NonNull::from(self.leak_raw_parts().0)` and `NonNull::from(self.leak())`. +- `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is equivalent to `NonNull::from(self.leak())` and `NonNull::from(self.leak_raw_parts().0)`. - `unleak_raw_parts`: take a previously leaked `NonNull` pointer and additional state and restore the container from it. - `into_raw_parts`: leak the container and return a raw pointer to its content along with any other state that would be needed to restore the container. - `from_raw_parts`: take a previously leaked raw pointer and additional state and restore the container from it. @@ -113,8 +113,9 @@ impl LinkedList { The `leak_raw_parts` method is the equivalent of `leak_raw` for multi-value containers like `String` that return extra data beyond the pointer needed to reconstruct the container later. The `unleak_raw_parts` method is the equivalent of `unleak_raw`. -The `String::leak_raw_parts` method is a nice representative of the new API for multi-value containers because it produces a more semantic fat-pointer to the string's contents. +The `String::leak_raw_parts` method is a nice example of the new `leak_raw` API because it returns the most accurate pointer type possible to represent the raw string data. Instead of a `(*mut u8, usize)` pair for the pointer and length, it returns a `NonNull`, which encodes its length and retains the UTF8 invariant together. +The following example shows how `leak_raw_parts` makes it easier to work with the leaked string than `into_raw_parts`: ```diff let string = String::from("🗻∈🌏"); @@ -125,11 +126,10 @@ let string = String::from("🗻∈🌏"); + assert_eq!(Some("🗻"), unsafe { ptr.as_ref().get(0..4) }); - assert_eq!(Some("🗻"), unsafe { str::from_utf8_unchecked(slice::from_raw_parts(ptr, len)).get(0..4) }); -let string = String::unleak_raw_parts(ptr, cap); ++ let string = String::unleak_raw_parts(ptr, cap); +- let string = String::from_raw_parts(ptr, len, cap); ``` -Using the `into_raw`/`from_raw` API, the above example would require more machinery to re-assert the contents are valid UTF8 before returning a `&str`. - ## When do I use `into_raw`/`from_raw`? The `into_raw`/`from_raw` and `into_raw_parts`/`from_raw_parts` methods are good for FFI where a Rust type needs to be used by non-Rust code. @@ -181,8 +181,10 @@ The `into_raw_parts` method is the equivalent of `into_raw` for multi-value cont The `from_raw_parts` method is the equivalent of `from_raw`. An FFI over `Vec` is a nice example of when `into_raw_parts` can be helpful over `leak_raw_parts`. -`Vec::leak_raw_parts` returns a fat `NonNull<[u8]>` pointer, but `NonNull<[u8]>` (and consequently `*const [u8]`) is not considered FFI-safe. -Instead, we can use `Vec::into_raw_parts`, which only uses FFI-safe `*mut u8` and `usize` types: +An FFI should only be built from FFI-safe types that have a well-known representation, but the fat `NonNull<[u8]>` pointer returned by `leak_raw_parts` (and consequently `*const [u8]`) is not considered FFI-safe. +That's not a problem for `into_raw_parts` though because it only returns FFI-safe `*mut u8` and `usize` types. + +The following example shows how `into_raw_parts` makes it easier to work with FFI-safe values than `leak_raw_parts`: ```diff #[repr(C)] @@ -212,8 +214,6 @@ pub unsafe extern "C" fn vec_destroy(vec: RawVec) { } ``` -Using the `leak_raw_parts`/`unleak_raw_parts` API, the above example would require more machinery to convert the `NonNull<[u8]>` into FFI-safe types. - [ffi-guide]: https://michael-f-bryan.github.io/rust-ffi-guide/basic_request.html#creating-the-c-interface # Reference-level explanation @@ -284,9 +284,9 @@ impl String { } ``` -These conversion methods follow the existing semantics of static functions for containers that dereference to their inner value like `Box`, and inherent methods for other containers like `Vec`. +These conversion methods follow the existing semantics of static functions for containers that dereference to their inner value like `Box`, and inherent methods for others. -The docs for `into_raw`/`from_raw` methods will point users to `leak_raw`/`unleak_raw` unless they need FFI-safety. +The docs for the `into_raw`/`from_raw` methods will point users to `leak_raw`/`unleak_raw` unless they need FFI-safety. The `NonNull<[T]>` and `NonNull` methods are expected to eventually offer a way to get their length without needing to go through a reference first, but the exact mechanism is left as out-of-scope for this RFC. @@ -297,6 +297,9 @@ A drawback of this approach is that it creates a standard that any future contai It creates more API surface area that needs to be rationalized with future idioms, just like this RFC is attempting to do for `into_raw`/`from_raw` with `NonNull`. As an example, if a future Rust stabilizes another even more appropriate pointer type then it would need to be fit into this scheme. +It introduces more APIs so users have to choose the right one for their usecase instead of just trying to make the only option available work for them. +With clear guidance in the documentation for these methods and similarities in their design this shouldn't be an issue in practice. + # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -304,16 +307,18 @@ An alternative is to just start using `NonNull` going forward and accept the This isn't preferable to keeping new `into_raw`/`from_raw` pairs consistent with the ones that already exist because it forces users to learn the return values for all of these methods by rote instead of being able to rely on simple conventions. Another is to just use `leak` methods and the conversion from `&T` and `&mut T` into `NonNull` to work with. -This isn't preferable to method pairs that return a `NonNull` and look similar to `into_raw`/`from_raw` because they're less discoverable while still being preferable, and require more steps to leak and unleak than would otherwise be needed. +This isn't preferable to method pairs that return a `NonNull` and look similar to `into_raw`/`from_raw` because they're less discoverable while still being preferable for common usecases, and require more steps to leak and unleak than would otherwise be needed. Another is to deprecate `into_raw`/`from_raw` in favor of `leak_raw().as_ptr()` and `NonNull::new_unchecked(ptr)`. This makes it easier to discover the preferred API for working with raw container contents and the expense of more machinery in FFI use-cases. +This isn't preferable to guidance in docs on both sets of methods because it puts more burden on FFI code and deprecates APIs that are already perfectly suited to their needs. +This could possibly be worked around by making it easier to convert types like `NonNull<[T]>` into a `(*mut T, usize)` pair. # Prior art [prior-art]: #prior-art -The prior art is `Box`, which already has the `leak`, `into_raw` and `from_raw` methods. -It also has unstable `into_raw_non_null`, but is deprecated in favor of `NonNull::from(Box::leak(b))`. +The prior art is `Box`, which already has the `leak`, `into_raw` and `from_raw` methods. +It also has the unstable `into_raw_non_null`, but is deprecated in favor of `NonNull::from(Box::leak(b))`. This current workaround is the second alternative listed above, that isn't considered preferable to `Box::leak_raw(b)`. # Unresolved questions From 1e74b1062fd8793a1d26ddd0739c2735ab3b2448 Mon Sep 17 00:00:00 2001 From: Ashley Mannix Date: Sun, 23 Aug 2020 21:30:47 +1000 Subject: [PATCH 7/9] correct allocated length to initialized length --- text/0000-container-leak.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-container-leak.md b/text/0000-container-leak.md index abf5b59b3c4..79ffc4dd5b9 100644 --- a/text/0000-container-leak.md +++ b/text/0000-container-leak.md @@ -18,7 +18,7 @@ For containers with a single value like `Box`, `Arc`, and `Rc`, any sub For multi-value containers like `Vec` and `String`, any subset of the following method pairs should be added to work with their raw representations: -- `leak`: shrink the container to its allocated length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content. +- `leak`: shrink the container to its initialized length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content. - `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is equivalent to `NonNull::from(self.leak())` and `NonNull::from(self.leak_raw_parts().0)`. - `unleak_raw_parts`: take a previously leaked `NonNull` pointer and additional state and restore the container from it. - `into_raw_parts`: leak the container and return a raw pointer to its content along with any other state that would be needed to restore the container. From 0b0a23c3a63cb809e884c0b16cf9c660cb727b4e Mon Sep 17 00:00:00 2001 From: Ashley Mannix Date: Thu, 14 Jan 2021 13:36:17 +1000 Subject: [PATCH 8/9] weasel out of specifying shrinking on leak --- text/0000-container-leak.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-container-leak.md b/text/0000-container-leak.md index 79ffc4dd5b9..bfd8ae62eaf 100644 --- a/text/0000-container-leak.md +++ b/text/0000-container-leak.md @@ -18,7 +18,7 @@ For containers with a single value like `Box`, `Arc`, and `Rc`, any sub For multi-value containers like `Vec` and `String`, any subset of the following method pairs should be added to work with their raw representations: -- `leak`: shrink the container to its initialized length, leak it and return an arbitrarily long-lived shared or mutable reference to its allocated content. +- `leak`: leak the container and return an arbitrarily long-lived shared or mutable reference to its allocated content. The contents may or may not be shrinked as an implementation detail of the container. - `leak_raw_parts`: leak the container and return a `NonNull` pointer to its content along with any other state, like the allocated capacity, that would be needed to restore the container. The type `T` is the same as `Deref::Target`, so `NonNull::from(&*self)` is equivalent to `NonNull::from(self.leak())` and `NonNull::from(self.leak_raw_parts().0)`. - `unleak_raw_parts`: take a previously leaked `NonNull` pointer and additional state and restore the container from it. - `into_raw_parts`: leak the container and return a raw pointer to its content along with any other state that would be needed to restore the container. From b15b8768bc51c3fe8e1e1e0affc10fbe391bc77e Mon Sep 17 00:00:00 2001 From: Ashley Mannix Date: Thu, 14 Jan 2021 13:47:35 +1000 Subject: [PATCH 9/9] note that we don't require these methods exist --- text/0000-container-leak.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-container-leak.md b/text/0000-container-leak.md index bfd8ae62eaf..36ca347aa8a 100644 --- a/text/0000-container-leak.md +++ b/text/0000-container-leak.md @@ -6,7 +6,7 @@ # Summary [summary]: #summary -Describe a standard set of methods for converting container types like `Box`, `Arc`, `Vec`, `String` to and from raw pointers. +Describe a standard set of methods for converting container types like `Box`, `Arc`, `Vec`, `String` to and from raw pointers. This RFC doesn't suggest all of these methods actually exist, only that if they do they follow the standard laid out. For containers with a single value like `Box`, `Arc`, and `Rc`, any subset of the following method pairs should be added to work with their raw representations: