Skip to content

Conversation

@kkent030315
Copy link
Contributor

@kkent030315 kkent030315 commented Nov 3, 2024

Added CLR parser implementations as proposed in the issue.

  • Added .NET signature and COR20 header, storage header and metadata header parser.
    • Added MVID (GUID) and PK token (Public Key Token) parser.
  • Added CLR sections parser.
  • Added CLR tables parser.
    • Requires full re-implementation of IMetaDataTables where originally written in hard C++
  • Added CLR imports parser.
    • Requires full re-implementation of IMetaDataImport where originally written in hard C++

@kkent030315 kkent030315 marked this pull request as draft November 3, 2024 11:41
@kkent030315 kkent030315 mentioned this pull request Nov 3, 2024
17 tasks
@kkent030315 kkent030315 marked this pull request as ready for review November 5, 2024 09:14
@kkent030315
Copy link
Contributor Author

Turns out that CLR imports and tables parser requires very complex implemenetations and I think it should not be implemented in goblin, it's up to consuemrs handle it in upper implementations, it at least should be enough to expose fundamental datas for that. So I am marking this as ready-for-review.

kkent030315 added a commit to kkent030315/goblin-experimental that referenced this pull request Mar 26, 2025
@m4b
Copy link
Owner

m4b commented Jun 16, 2025

I don't have the time right now to review this, but I assume this is something we want in? if yes, needs rebasing, and i'll try to get to it next week

@kkent030315
Copy link
Contributor Author

@m4b Yes, please. Thank you so much for your time :)

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really great, and looks like a lot of work, thank you (as usual!); we should remove the repr(c) and then this is ready to go. in general, unless that struct is literally being passed to or from C functions, or something else that is speaking C-ish (e.g., python), you should avoid repr(c) since it forces a particular layout on the struct that rust does/does not have to have, which can sometimes even have performance implications (though is is most likely not relevant at all here)

@kkent030315
Copy link
Contributor Author

@m4b Done :)

@m4b
Copy link
Owner

m4b commented Jul 27, 2025

ok this has some minor conflicts, once rebased we can merge!

@kkent030315 you've done so much incredible work on the PE end I'd like to add you as a crate author, and also ask if you'd like to be the default PE backend maintainer? We/I don't really have a formal structure in this repo/crate, so what that generally means I guess is that I'd default to your expertise on the PE format in structure and how it's used "in the wild", and otherwise I'd just do basic code review (as I do today), primarily style, sometimes perf, and these days mostly whether something is breaking/some new public interface we may want/not want to support. After a while I might not even do that, and we could talk about merge rights, etc.

If you have any other questions, feel free to email me directly as well, and we can correspond in that manner if you like.

Anyway, hope you're having a good weekend, and again, thanks for all your hard work!

@kkent030315
Copy link
Contributor Author

@m4b Hey, thanks for the proposal about being a maintainer of this great crate. It's better to discuss that matter further on an email and I am definitely interested in. Thank you for spending your time on reviewing my code (uh sometimes I do some idiot things). Much appreciated!

(Also done a rebase)

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the two byte preads, and unless good reason, I suggest to parse out the string. After that we can merge, thanks for your patience!

src/pe/clr.rs Outdated
pub size: u32,
/// Indicates the name of the stream as a null-terminated ANSI string.
/// This field can hold up to 32-bytes long, including the null-terminator.
pub name: &'a [u8],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other places in goblin we use a &'a str here; scroll can pass a terminator byte to extract out a string, or in your case you can use DelimiterUntil(0, 32). There might be good reasons you don't want to parse out the string (but since it's null-terminated i don't think there isn't a reason to, e.g., if it was 32-bytes long and that sequence was used as an identifier, which might not be valid utf-8 for example), but if it really is ANSI null terminated, then since every ansi string is a utf-8 string (i'm pretty sure :P), I think it's better to make this a proper &str.

Copy link
Contributor Author

@kkent030315 kkent030315 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think DelimiterUntil(0, 32) is suitable for this case due to this check1. It will fail if the slice length is less than 32 bytes long even if it is valid. I don't know if this is intentional in scroll, I'd rater expect DelimiterUntil(0, 32) to bahave like min(first_nul_pos, min(slice_len, 32)) instead of being the slice length (32) required.

---- pe::clr::tests::parse_clr_sections stdout ----
[6c, 00, 00, 00, f0, 15, 0d, 00, 23, 7e, 00, 00, 5c, 16, 0d, 00, 34, 23, 04, 00, 23, 53, 74, 72, 69, 6e, 67, 73, 00, 00, 00, 00, 90, 39, 11, 00, 38, b9, 04, 00, 23, 55, 53, 00, c8, f2, 15, 00, 10, 00, 00, 00, 23, 47, 55, 49, 44, 00, 00, 00, d8, f2, 15, 00, fc, 12, 02, 00, 23, 42, 6c, 6f, 62, 00, 00, 00]
#~: 11 (12)
[5c, 16, 0d, 00, 34, 23, 04, 00, 23, 53, 74, 72, 69, 6e, 67, 73, 00, 00, 00, 00, 90, 39, 11, 00, 38, b9, 04, 00, 23, 55, 53, 00, c8, f2, 15, 00, 10, 00, 00, 00, 23, 47, 55, 49, 44, 00, 00, 00, d8, f2, 15, 00, fc, 12, 02, 00, 23, 42, 6c, 6f, 62, 00, 00, 00]
#Strings: 17 (20)
[90, 39, 11, 00, 38, b9, 04, 00, 23, 55, 53, 00, c8, f2, 15, 00, 10, 00, 00, 00, 23, 47, 55, 49, 44, 00, 00, 00, d8, f2, 15, 00, fc, 12, 02, 00, 23, 42, 6c, 6f, 62, 00, 00, 00]
#US: 12 (12)
[c8, f2, 15, 00, 10, 00, 00, 00, 23, 47, 55, 49, 44, 00, 00, 00, d8, f2, 15, 00, fc, 12, 02, 00, 23, 42, 6c, 6f, 62, 00, 00, 00]

It is guaranteed that the string is terminated by null, so it should suffice just do bytes.gread::<&str>(offset).

Footnotes

  1. https://github.com/luser/scroll/blob/cec367e1d0300e3e4ca20cf57485a0e430d32ac2/src/ctx.rs#L356-L359

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for patience, lgtm; i'll try to fix the unnecessary collect in a hotpatch

pub fn mvid(&self) -> error::Result<Option<&'a [u8]>> {
Ok(self
.sections()
.collect::<Result<Vec<_>, _>>()?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect shouldn't be necessary here

@m4b m4b merged commit 240c714 into m4b:master Aug 15, 2025
5 checks passed
@m4b
Copy link
Owner

m4b commented Aug 15, 2025

NB: backwards compatible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants