Add support for custom bakes to databake #6576

sffc · 2025-05-11T00:50:30Z

gemini-code-assist · 2025-05-11T00:50:34Z

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

sffc · 2025-05-11T08:19:03Z

Things I'd like feedback on:

Names of things
The two traits (safe and unsafe) and how they do/don't interact with each other
The unusual safety requirement on the unsafe trait
The new overload on the macro and whether it has any risk of being a breaking change

robertbastian · 2025-05-12T14:02:15Z

utils/databake/derive/src/lib.rs

+/// To bake to a different type than this, use `custom_bake`
+/// and implement `CustomBake`.


I think having a trait for this is overkill if you can provide a method to the macro

Reasons I made it a trait:

Gives a place to enforce the strange safety requirement in the unsafe version

And since we have it for unsafe, we can also use it for safe

How do you suggest handling the safety requirement without a trait?

robertbastian · 2025-05-12T14:02:50Z

utils/databake/derive/src/lib.rs

+///
+/// #[derive(Bake)]
+/// #[databake(path = bar::module)]
+/// #[databake(path = custom_bake)]


Suggested change

/// #[databake(path = custom_bake)]

/// #[databake(custom_bake = Self::bake_to_bytes)]

Manishearth · 2025-05-21T00:12:20Z

So, the promised "complicated thoughts":

By and large I disprefer adding new types and traits. On the other hand, adding attributes/toggles/etc to a derive are something I think is a good way to achieve goals. As such, the original proposed design seemed pretty good to me.

So my first reaction to this PR was "we should go back to the original proposal that we agreed on". Or use From.

So basically something where we can specify to/from functions or use a preexisting trait. So #[databake(path = ..., custom_bake = Foo)] using From/Into or custom_bake = (type = &[u8], to = Foo::to_bytes, from = Foo::from_bytes.

But that works for safe conversions. I agree that this is not as good for unsafe conversions. For an unsafe conversion the bare minimum is that the macro should mention unsafe somewhere, but ideally you have unsafe {} or unsafe impl somewhere.

But if you have to implement a custom trait, I once again go back to comparing it with the motivation of reducing boilerplate: isn't the whole idea to remove custom impls? I thought about it more, and concluded that replacing a TokenStream-universe custom impl with a value-universe custom impl is still valuable. (this type of question is why I am so insistent on fully understanding the motivation before talking too deeply about solutions)

Putting all of this together, I end up with:

I think we probably should use a trait for the unsafe conversions
I'm not convinced we should use a trait for the safe conversions, macro magic seems better, even if it ends up with two somewhat different ways of doing things. I'm overall fine with unsafe stuff being different.

Looking at the existing trait I'm not really a fan of the nonlocality of the guarantees, referencing the existence of an inherent method. How about a single trait:

/// Safety: implementation is valid if from_baked is always safe when fed values from to_bake
unsafe trait CustomBakeConversions {
   type Baked<'a>: Bake;
   /// Allowed to panic
   fn to_baked(&'a self) -> Self::Bake<'a>;
   /// Safety: called on values produced by to_baked
   unsafe fn from_baked(baked: Self::Bake) -> Self;
}

invoked with databake(..., custom_bake(type = &[u8], unsafe))

And then for the "safe" bake we have MVP #[databake(..., custom_bake = &[u8])] where we assume the existence of safe to_baked/from_baked functions OR From/Into functions (dealer's choice). We can add customizeability here when desired.

Thoughts?

robertbastian · 2025-05-21T06:41:55Z

can't use a trait for const construction, which is the actual unsafe part (to_baked is safe)

sffc · 2025-05-21T15:42:32Z

But if you have to implement a custom trait, I once again go back to comparing it with the motivation of reducing boilerplate: isn't the whole idea to remove custom impls? I thought about it more, and concluded that replacing a TokenStream-universe custom impl with a value-universe custom impl is still valuable.

This has been my position and I appreciate your eloquence. ❤️

I think we probably should use a trait for the unsafe conversions

I'm not convinced we should use a trait for the safe conversions, macro magic seems better, even if it ends up with two somewhat different ways of doing things. I'm overall fine with unsafe stuff being different.

My position is that since we need a trait for the unsafe, then it's harmless to support it in safe mode. The unsafe trait can just be an extension on the safe trait, as proposed in this PR. At least, it can be the default behavior when custom_bake is used without any arguments in the derive.

Looking at the existing trait I'm not really a fan of the nonlocality of the guarantees, referencing the existence of an inherent method. How about a single trait:

I would prefer a single trait, but the constructor can't be on a trait, at least not until we have const traits.

Manishearth · 2025-05-21T16:15:02Z

can't use a trait for const construction, which is the actual unsafe part (to_baked is safe)

argh. okay, fine, the indirect const function is acceptable, but I still don't like it.

My position is that since we need a trait for the unsafe, then it's harmless to support it in safe mode. The unsafe trait can just be an extension on the safe trait, as proposed in this PR. At least, it can be the default behavior when custom_bake is used without any arguments in the derive.

I think this is a bad trait. It has a strange nonlocal guarantee¹ and it's mimicking existing Rust conversion traits. We need it for proper unsafe hygeine, which makes me marginally okay with having it: unfortunately the bad trait is the best we can do. We do not need it for the other things. I do not want to introduce a second bad trait that is only there for consistency.

In the long run, we can probably have the first trait be const From/const Into.

If the options on the table are two traits or not doing this at all I prefer not doing this at all. I accept the motivation of this change, I do not accept that it overrides all other concerns, and I think having a largely extraneous trait is where I draw the line. I would prefer to solve this without new traits at all, but safety hygeine forces us to have at least one, which I begrudgingly accept. I don't want to stretch that to two traits.

yes, in safe mode it's not a safety guarantee, but it's still a guarantee. ↩

sffc · 2025-05-21T19:26:24Z

I don't think I agree with From/Into being a long-term goal we want to work toward.

<reasoning>
I've tried, multiple times, to get my Rust trait frameworks to sit on top of From/Into, and I run into various types of issues:

No great way to implement From<&T> for &U, and generally the traits get messy with borrowed things. They work much better with owned-to-owned conversions.
Sometimes we don't want the serialized/baked repr to be the canonical representation in the target type, like &str or &[u8]. For example, if we want to bake Pattern, we might bake it as bytes, but the canonical bytes should perhaps be the UTF-8 unparsed pattern.

Traits are cheap, clean, and easy to understand. I've moved much more toward "favor use-case-specific traits" than "try and shoehorn some existing trait into a use case that doesn't exactly match".
</reasoning>

However, I still think that this is cleaner using some trait, rather than making the proc macro more complicated. As you know, I would rather us work toward getting rid of proc macros. There seemed to be consensus at RustWeek that proc macros are bad, because the pull in Syn, require running code at build time, are hard for tooling like rust-analyzer, etc. This is a theme that came up again and again. There was desire to eventually move toward macro_rules for derives, but the team acknowledged that there's still a long way before we get there. However, what we can do for now is avoid over-complicating our proc macro.

sffc · 2025-05-21T19:36:04Z

If we wait for const traits to land, which I'm told by the lang team should be some time in the not-too-distant future, would you approve this with a trait-based solution?

You listed two reasons you don't prefer traits:

Because they have an indirect reference to the constructor
Because they look similar to From/Into

Const traits will fix (1). See my previous post for why I think we should not aim for (2).

Manishearth · 2025-05-21T21:09:16Z

As you know, I would rather us work toward getting rid of proc macros. There seemed to be consensus at RustWeek that proc macros are bad, because the pull in Syn, require running code at build time, are hard for tooling like rust-analyzer, etc. This is a theme that came up again and again. There was desire to eventually move toward macro_rules for derives, but the team acknowledged that there's still a long way before we get there. However, what we can do for now is avoid over-complicating our proc macro.

This is an extremely long term goal and I do not think we will get close to this any time soon (in the next five years or so). I've seen a lot of this desire, but the actual design work for this is extremely nascent¹, and macro work has historically taken ages to occur. We still do not have "Macros 2.0", a nine-year old feature proposal that is still actively desired and occasionally worked on.

The flexibility of the macro system overall makes it very tricky to evolve: I do not begrudge the Rust team their time in working on this, but I also expect very little when it comes to large macro system improvements.

Given that, I disagree with "what we can do for now is avoid overcomplicating our proc macro". Something that is >5 years out in the future, potentially even 10 years out in the future, is not something I find it useful (or even possible?) to design towards. When that time comes near, we can perform a proper holistic redesign of databake. Until then I don't find it useful to prevent ourselves from certain design patterns because they will need to change at that level; we cannot truly predict what will and won't be complicated in that future. Furthermore I think designing proc macros with good UX now would be helpful in informing what use cases the lang team should consider when designing a declarative macro future.

I very explicitly did not try and design yoke for a potential future GAT world. I knew it was coming soon, I could have designed it differently with expectations of it fitting in better with GATs, I decided not to. It's good that I did: the way GATs ended up working was not how I had envisoned them as working wrt yoke, and trying to "prepare" for that might have actually made the crate worse. There's still some stuff that I'd like to experiment with there, but in this case there's no rush.

I feel the same way about our proc macros and a potential future with more powerful decl macros. For zerovec, I am interested in ways to supplement zerovec with currently-possible decl macros to improve the dependency situation. But databake is not a normal runtime dep so I'm not super interested in databake decl macros unless we can replace it completely with decl macros (with decent UX), and I don't think decl macros are currently at that point. Eventually when we have fancy decl macros that can do this type of thing well, I'd love to try and use them, and revisit decisions like these. 5+ years is a wonderful time to perform a new holistic design.

If we wait for const traits to land, which I'm told by the lang team should be some time in the not-too-distant future, would you approve this with a trait-based solution?

I'd still be hesitant. My preference in databake and zerovec is if we are already using a proc macro, then we should use proc macro attribute configs as much as possible before adding new items to the public API. We have more flexibility with those attributes, and can play around with it and arrive on better holistic designs much more easily. This is a reason I have not yet stated in this thread, but I have stated before when it comes to additions to zerovec.

I'll also note: the problem with a full bidirectional trait without the indirectness is that the crate now needs an unconditional runtime dependency on databake. This is compile time infra, i'd love for it to stay compile time infra, which to me means solving it in the proc macro world.

I recognize that I've both expressed a dislike for the indirectness and just now expressed why not having the indirectness is also bad; but this is why I disprefer traits here.

Given the runtime dependency problem I think my preferred path forward is to have an "indirect" unsafe trait for the unsafe construction and use proc macro attributes for the safe construction, even in the presence of const Traits. I don't have a strong desire to use From/Into here, I just don't want to introduce new traits, but recognize a strong reason to do so for unsafe. In the long run i'd love to redesign this when macros are better.

in terms of progress made. I remember writing down ideas for custom derives that didn't need an AST library before we had the concept of tokenstream-based custom derives. People have been thinking about this problem since before Rust 1.0. ↩

sffc · 2025-09-08T03:36:51Z

For documentation, I still think I prefer a trait for the safe case (as well as the unsafe case) because

Structure between the two cases is more similar
Better uniformity across call sites; don't need to debate what to call the constructor
Reduces complexity of the proc macro. I acknowledge your counterarguments but don't think I agree:
- "We have more flexibility with those attributes" ==> why do we need flexibility? There should be only one way to do things. You can implement the trait however you like to implement it.
- "and can play around with it and arrive on better holistic designs much more easily" ==> again, I don't see what holistic design you think we should have here. This is a general argument, but I don't think it applies to this specific case.

But, it's not a good use of time to debate that point further. I assume that we will move forward with a macro-selected constructor instead of a trait for the safe case.

"the crate now needs an unconditional runtime dependency on databake" => true, I hadn't considered this. But, you agree that a trait is needed for the unsafe case. Is this a fatal flaw / should I close the PR, or should I move forward despite it? Should we consider an alternative such as having the proc macro paste the unsafe trait definition locally inside the file?

Manishearth · 2025-09-08T18:06:54Z

But, you agree that a trait is needed for the unsafe case. Is this a fatal flaw / should I close the PR, or should I move forward despite it?

I think that is a major-but-not-fatal flaw for the unsafe trait; and makes me prefer just having unsafe users use a custom Bake.

As dissected before, there are two motivations here:

reducing TokenStream boilerplate
reducing the scope of unsafe

I think the boilerplate one is well accepted, and is solved wonderfully with the macro solution. In the unsafe trait case I am more hesitant because the trait guarantee is strangely nonlocal and users need to depend on databake.

Also, fwiw, while proc_macro2 solves most of the problems here, libproc_macro is weird when it comes to Rust build systems (specifically, it cannot be built as a dependency of a non-proc crate). This can cause problems in e.g. custom build systems and is an additional source of friction. Users of icu_provider_source + databake of course can expect that, but the set of ICU4X users is wider and I would not like to impose this upon them.

So I think you could either close this PR, land the attribute just for the safe case, or land a safe attribute and an unsafe trait. I prefer not doing the unsafe trait but I am ultimately okay with it. I definitely want the safe attribute if possible.

Better uniformity across call sites; don't need to debate what to call the constructor

minor note: not to further this discussion, but personally I don't want utils crate design to be driven by things like this; this is ICU4X API policy and we could easily solve this problem by deciding on a policy for ICU4X. Could even be documented in databake as a convention we recommend to all users.

sffc · 2025-09-09T01:50:08Z

What did you think of this suggestion:

Should we consider an alternative such as having the proc macro paste the unsafe trait definition locally inside the file?

The trait would be private locally in each crate with bakeable structs and its only purpose would be to require the caller to write an explicit unsafe impl with the safety text.

Manishearth · 2025-09-09T03:02:56Z

Oh, sorry, forgot to address that.

That's an interesting proposition. The main problem I see is that there won't be source code for unsafe reviewers to look at for the invariant, whereas if it's just driven by #[unsafe_bake_from_parts = (from_parts, to_parts)] or whatever it's obvious that the attribute is unsafe and the proc macro docs will explain the invariants. I don't think this is a huge deal.

It's a bit unconventional, but it's sufficiently decoupled that it's probably fine? I still have a preference for implementing the safe version of this and seeing how much we actually need the unsafe version.

sffc · 2025-09-09T03:08:22Z

I still have a preference for implementing the safe version of this and seeing how much we actually need the unsafe version.

(I'm not aware of any call sites in ICU4X that would be served by the safe version)

Manishearth · 2025-09-09T03:52:55Z

I'm looking around and while I see some safe custom Bake impls I presume you're saying they mostly won't be served by a parts-based construction. Unfortunate.

The unsafe ones are by and large concentrated in zerovec/zerotrie/etc. In general I am okay with the utils crates needing to do extra heavy lifting (in fact, I have a slight preference for zerovec to keep its Bake impls because they're clearer from an unsafe perspective, and zerovec does a ton of unsafe already); it's when it starts affecting components that it becomes an issue, because we want components to be easy to write. And within components I mostly see some units/currency and the three collections.

I don't see that as much, ultimately.

I think an additional thing that colors my opinion here is that while we want components to be easy to write, unsafe code is already hard to get right and I'm not overly concerned with friction there. We do not have the best reputation for avoiding silly unsafe code mistakes with code review (e.g. #6805); we sometimes catch things but not always. I'm generally pleased with the quality of our existing unsafe code; especially the utils code, we've put a lot of work into it, gotten it reviewed, and soundness/security issues tend to have a half-life for going undetected. An ICU4X contributor who needs unsafe bake needing to ask someone with unsafe/proc macro experience to write it for them is not necessarily a bad thing.

sffc · 2025-09-09T04:12:55Z

The impls I've mostly been looking at are things like PackedPattern, PatternItem, etc. Things with safety invariants and that might have a VarULE inside.

sffc · 2025-09-09T04:14:46Z

I think it is valuable and easy to review for the safety invariant to be "the input to this function MUST be the output of that function". It is easy to verify correctness.

Having to reason about this in the middle of a TokenStream is worse for everyone.

Manishearth · 2025-09-09T04:52:20Z

I think it is valuable and easy to review for the safety invariant to be "the input to this function MUST be the output of that function". It is easy to verify correctness.

Yes, but this invariant won't be listed anywhere in the code, if we generate a trait. Tracking it down is tricky.

If we don't generate a trait, then we have the runtime dep problem, and the invariant is still nonlocal in a strange way.

The impls I've mostly been looking at are things like PackedPattern, PatternItem, etc. Things with safety invariants and that might have a VarULE inside.

None of these types have unsafe bake impls right now, as far as I can tell.

Manishearth · 2025-09-09T04:53:30Z

Having to reason about this in the middle of a TokenStream is worse for everyone.

I generally agree, but with the tokenstream at least the invariants are all in the same file and relate to each other in straightforward ways. The unsafe Bake impls we have so far are quite straightforward.

sffc · 2025-09-09T23:03:31Z

What if databake produces an unsafe impl such that it must be inside of an unsafe? Does something like this work?

type BakeParts = ...;

impl Thing {
    #[doc(hidden)]
    pub fn to_bake_parts_v1(&self) -> BakeParts { ... }

    /// # Safety
    /// The parameter MUST have been returned by to_bake_parts_v1
    #[doc(hidden)]
    pub const unsafe fn from_bake_parts_v1(parts: BakeParts) -> Self { ... }
}

// Safety: from_bake_parts_v1 accepts the return value of to_bake_parts_v1
unsafe {
    databake::impl_from_parts_unsafe!(Thing, to_bake_parts_v1 => from_bake_parts_v1);
}

Manishearth · 2025-09-09T23:29:01Z

I don't understand what is and isn't generated there.

I'm in favor of a databake::unsafe_impl_from_parts!(Type, from => to) macro: the invariants are easy to specify for that.

sffc · 2025-09-10T01:15:39Z

My thought was that databake::unsafe_impl_from_parts! would generate an unsafe impl but with no commentary, such that hopefully maybe clippy would complain and force a manual safety comment. My only worry about the unsafe hygiene is that we wouldn't be required to check for the safety. Unless you think the word unsafe in databake::unsafe_impl_from_parts! is sufficient?

Manishearth · 2025-09-10T01:18:40Z

Unless you think the word unsafe in databake::unsafe_impl_from_parts! is sufficient?

Yes, that's standard macro naming practice. I wish there were a better way to do unsafe macros, but it's ... fine.

Manishearth · 2025-09-10T01:19:11Z

Do not rely on clippy lints triggering within macros; they are often explicitly disabled within macros depending on the lint

Add support for custom bakes to databake

1b5aaf7

sffc marked this pull request as ready for review May 11, 2025 08:11

sffc requested review from robertbastian and Manishearth as code owners May 11, 2025 08:11

sffc added 2 commits May 11, 2025 10:29

The unsafe trait should inherit from the safe one

6b5ff38

Clippy

d9dcc1f

robertbastian reviewed May 12, 2025

View reviewed changes

sffc mentioned this pull request May 15, 2025

DataBake: split serialized form from runtime form #2452

Open

sffc added the waiting-on-author PRs waiting for action from the author for >7 days label Aug 7, 2025

		/// To bake to a different type than this, use `custom_bake`
		/// and implement `CustomBake`.

	/// #[databake(path = custom_bake)]
	/// #[databake(custom_bake = Self::bake_to_bytes)]

Add support for custom bakes to databake #6576

Are you sure you want to change the base?

Add support for custom bakes to databake #6576

Uh oh!

Conversation

sffc commented May 11, 2025

Uh oh!

gemini-code-assist bot commented May 11, 2025

Uh oh!

sffc commented May 11, 2025

Uh oh!

robertbastian May 12, 2025

Choose a reason for hiding this comment

Uh oh!

sffc May 13, 2025

Choose a reason for hiding this comment

Uh oh!

robertbastian May 12, 2025

Choose a reason for hiding this comment

Uh oh!

Manishearth commented May 21, 2025

Uh oh!

robertbastian commented May 21, 2025

Uh oh!

sffc commented May 21, 2025

Uh oh!

Manishearth commented May 21, 2025

Footnotes

Uh oh!

sffc commented May 21, 2025

Uh oh!

sffc commented May 21, 2025

Uh oh!

Manishearth commented May 21, 2025

Footnotes

Uh oh!

sffc commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manishearth commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sffc commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manishearth commented Sep 9, 2025

Uh oh!

sffc commented Sep 9, 2025

Uh oh!

Manishearth commented Sep 9, 2025

Uh oh!

sffc commented Sep 9, 2025

Uh oh!

sffc commented Sep 9, 2025

Uh oh!

Manishearth commented Sep 9, 2025

Uh oh!

Manishearth commented Sep 9, 2025

Uh oh!

sffc commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manishearth commented Sep 9, 2025

Uh oh!

sffc commented Sep 10, 2025

Uh oh!

Manishearth commented Sep 10, 2025

Uh oh!

Manishearth commented Sep 10, 2025

Uh oh!

Uh oh!

sffc commented Sep 8, 2025 •

edited

Loading

Manishearth commented Sep 8, 2025 •

edited

Loading

sffc commented Sep 9, 2025 •

edited

Loading

sffc commented Sep 9, 2025 •

edited

Loading