Skip to content

Bootstrap TCC using pnut #524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

laurenthuberdeau
Copy link

@laurenthuberdeau laurenthuberdeau commented Mar 31, 2025

Context

This draft PR shows that pnut can provide an alternative path to TCC.

This is still work in progress as it currently need a prebuilt pnut-exe binary since the live-bootstrap environment doesn't provide a POSIX compliant shell when bootstrapping TCC . However, the binary can be built reproducibly from a POSIX shell with a script in the pnut repository per the instructions below.

Alternatively, the C subset used by pnut is relatively simple (only 1 struct that could be easily removed, very few dynamic memory allocations, no sizeof, everything is a signed integer or pointer) and could probably be ported to M2-Planet.

Prebuilt binary

To make the prebuilt pnut-exe binary:

> git clone [email protected]:udem-dlteam/pnut.git
> cd pnut
> git checkout laurent/live-bootstrap-snapshot
> ./utils/make-pnut-exe-for-tcc.sh --shell <shell> # To make pnut-exe with pnut-sh.sh
> ./utils/make-pnut-exe-for-tcc.sh # To make pnut-exe with pnut-exe-from-gcc

Getting the sources

Running ./download-distfiles.sh will download the .tar.gz file but Github will return 404 for the .tar, that's expected. To get the .tar, use gunzip distfiles/79832069f0d44c20a620a923a15e38a545c5e911.tar.gz.

checksum-transcriber sources
sha256sum -c sources.SHA256SUM
# checksum-transcriber sources
# sha256sum -c sources.SHA256SUM
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't like the format of the updated sources.SHA256SUM

simple-patch ${TCC_PKG}/libtcc.c \
../simple-patches/sscanf_TCC_VERSION.before ../simple-patches/sscanf_TCC_VERSION.after
simple-patch ${TCC_PKG}/tcc.h \
../simple-patches/undefine_TCC_IS_NATIVE.before ../simple-patches/undefine_TCC_IS_NATIVE.after
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pnut needed a few extra patches to get Janneke's tcc-0.9.26 fork to compile. These patches are reverted later to make sure the source is the exact same when compiling the last versions of TCC.

chmod 755 ${BINDIR}/pnut-exe

pnut-exe ${TCC_PKG}/tcc.c \
-I ${PNUT_PKG}/portable_libc/include/ \
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pnut comes with its own small libc that's just enough to get working the parts of TCC we care about. However, for tcc-boot0, we switch the Mes' libc as it's much more complete.

#ifdef PNUT_CC
char buf1[1024];
#else
char buf1[sizeof file->filename];
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pnut's parser doesn't consider sizeof expressions constants

#ifdef PNUT_CC
a = 0; b = 9; c = 26;
#else
sscanf(TCC_VERSION, "%d.%d.%d", &a, &b, &c);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pnut's libc doesn't implement sscanf so we initialize manually the a,b,c variables with the version.

@@ -0,0 +1 @@
#if defined _WIN32 == defined TCC_TARGET_PE && !defined PNUT_CC
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This masks off the #define TCC_IS_NATIVE which in turn means tccrun.c is not included.

@@ -1 +1,3 @@
https://lilypond.org/janneke/tcc/tcc-0.9.26-1147-gee75a10c.tar.gz 6b8cbd0a5fed0636d4f0f763a603247bc1935e206e1cc5bda6a2818bab6e819f tcc-0.9.26.tar.gz
git://github.com/udem-dlteam/pnut.git~79832069f0d44c20a620a923a15e38a545c5e911 https://github.com/udem-dlteam/pnut/archive/79832069f0d44c20a620a923a15e38a545c5e911.tar.gz da1203478efcd9ef3fbedf3d4b2a891c25e3d9928bb0ce0d0a2d60109b209ff2
git://github.com/udem-dlteam/pnut.git~79832069f0d44c20a620a923a15e38a545c5e911 _ 4556a24084f77fc76a5eb4e7e60367bfed1907bb2dc8f4208287eb6c50681435 79832069f0d44c20a620a923a15e38a545c5e911.tar
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add both the .tar.gz to make download-distfiles.sh pull the file, and the .tar to make rootfs.py copy the file in target/external/distfiles/ but only the .tar is used in the script.

@laurenthuberdeau laurenthuberdeau force-pushed the laurent/bootstrap-tcc-with-pnut branch from 8cccdb4 to ea6d1c4 Compare March 31, 2025 03:54
simple-patch ${TCC_PKG}/libtcc.c \
../simple-patches/error_set_jmp_enabled.after ../simple-patches/error_set_jmp_enabled.before
simple-patch ${TCC_PKG}/tccpp.c \
../simple-patches/array_sizeof.after ../simple-patches/array_sizeof.before
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps naively, I thought having #ifdef in the patches to disable the changes when compiling with tcc-pnut would be enough to get the same hash. Turns out that the debug information contains line information, so anything that adds the equivalent of new lines can result in a different hash. This means we need to undo the patches.

@fosslinux
Copy link
Owner

I'm happy to see that you were able to get pnut working in this way!

From my perspective, while pnut-exe is a binary, this cannot be merged. Unfortunately, adding in pnut-exe as a seed is a big negative.

Out of the two options you have given for replacing pnut-exe, having it buildable by M2-Planet is much preferable.

If my understanding is correct, pnut can be used without generating a POSIX shell script at all? That seems to be what is happening here.

It might be interesting to have M2-Planet -> pnut-exe -> tcc....

Pnut only needs the Mes C library so we can remove
the other steps in mes-0.27/pass1.kaem to speed up
development. Instead of taking a few minutes,
getting to tcc-0.9.26 now takes ~30 seconds.
@laurenthuberdeau
Copy link
Author

laurenthuberdeau commented Mar 31, 2025

From my perspective, while pnut-exe is a binary, this cannot be merged. Unfortunately, adding in pnut-exe as a seed is a big negative.

I agree! This draft PR is meant to demonstrate that pnut can reproduce the tcc-0.9.26 binary and as a starting point for a potential M2-Planet -> pnut-exe -> tcc path.

If my understanding is correct, pnut can be used without generating a POSIX shell script at all? That seems to be what is happening here.

Exactly. The prebuilt pnut-exe comes from pnut-exe's code generator, and pnut-exe can be compiled with pnut-sh.sh or with an existing C compiler.

I'll look into porting pnut's source code to M2-planet over the next weeks.

@stikonas
Copy link
Collaborator

stikonas commented Mar 31, 2025

I wonder if M2-Planet as any chance of building pnut. If so, that might remove the need for pre-built pnut-exe. Oh yes, I can see you had this question too...

@cosinusoidally
Copy link

cosinusoidally commented Apr 15, 2025

I wonder if M2-Planet as any chance of building pnut. If so, that might remove the need for pre-built pnut-exe. Oh yes, I can see you had this question too...

I've managed to create a version of pnut-exe that can be built by cc_x86 (or M2-Planet). I did this by porting the x86 version of pnut to a subset of C that is also valid JavaScript.

This script will build pnut-exe and then build pnut-exe using cc_x86 and then use pnut-exe to build the live bootstrap bootstrappable version of tcc (with @laurenthuberdeau patches applied):

https://github.com/cosinusoidally/tcc_simple/blob/master/experiments/mk_pnut_cc_x86

This also needs a checkout of https://github.com/cosinusoidally/tcc_bootstrap_alt/ (as it uses the copy of cc_x86 from that repo).

I also have a couple of other alternative build modes:

https://github.com/cosinusoidally/tcc_simple/blob/master/experiments/mk_pnut_mujs builds pnut-exe using the mujs JavaScript vm. Since I ported pnut to JS (https://github.com/cosinusoidally/tcc_simple/blob/master/experiments/pnut_refactor/pnut.js) it can also be run inside a JS VM. For now it only works in this custom mujs build, but I am planning to get it building with node.js and Spidermonkey.

https://github.com/cosinusoidally/tcc_simple/blob/master/experiments/mk_test_m2_pnut build with M2-Planet. This requires M2-Planet is in your PATH. The version of stage0-posix I used is from here https://github.com/cosinusoidally/mishmashvm/tree/master/tcc_js_bootstrap/alt_bootstrap/stage0-posix (this is an older fork I have been using for a while).

https://github.com/cosinusoidally/tcc_simple/blob/master/experiments/mk_test_cc_x86_pnut checks that cc_x86 and M2-Planet produce idential M1 files when compiling pnut-exe. This works as I'm using an older version of M2-Planet that will produce output identical to cc_x86

https://github.com/cosinusoidally/tcc_simple/blob/master/experiments/mk_test_pnut makes sure pnut can be built by both gcc and a stock version of tcc-0.9.27.

https://github.com/cosinusoidally/tcc_simple/blob/master/experiments/mk_tcc-boot-mes takes and already built copy of pnut-exe from artfacts/ and uses it to build tcc-boot-mes. This is generally called from other scipts.

Note the changes I made to pnut are probably more extensive than necessary. I suspect M2-Planet/Mesoplanet could build a nearly stock upstream pnut-exe as pnut is already written in a fairly conservative dialect of C. I also didn't port the sh backend, but I might try and re-add it at some point.

https://github.com/cosinusoidally/tcc_simple/blob/master/experiments/mk_coverage_pnut is to test code coverage. This was useful during porting as it allowed me to cut out code that I didn't need, plus it allowed me to check that I wasn't accidentally breaking code that wasn't touched. I have around 90% code coverage (which I will improve eventually)

@cosinusoidally
Copy link

I though I'd mention this here. I've wired up my pnut_js fork to live-bootstrap. My changes to live-bootstrap is definitely not in a mergeable state, but I thought it may be of interest:

cosinusoidally#6 is the internal PR in my fork.

See also:
https://github.com/cosinusoidally/live-bootstrap/blob/pnut_js/steps/pnut_js-1.0/pass1.kaem
and
https://github.com/cosinusoidally/live-bootstrap/blob/pnut_js/steps/pnut_js-1.0/pnut_refactor/build_alt.kaem

This avoids the need for a prebuilt pnut-exe binary.

CI also passes (I think it shaves off around 18 mins):

https://github.com/cosinusoidally/live-bootstrap/actions/runs/14814217910

If I were to tidy this up the steps would probably be something like:

  • cut a real release of pnut_js
  • add a live-bootstrap option "USE_PNUT_JS" to conditionally skip the build of mescc and instead use pnut_js
  • add this path to CI in addition to the standard mescc path
  • update pnut_js to avoid the use of mmap as builder-hex0 does not support mmap

@laurenthuberdeau
Copy link
Author

I though I'd mention this here. I've wired up my pnut_js fork to live-bootstrap. My changes to live-bootstrap is definitely not in a mergeable state, but I thought it may be of interest:

Nice work! Is the plan to use the many javascript runtimes for DDC like pnut does with shells? Or is muJS compatible with M2-Planet?

Note the changes I made to pnut are probably more extensive than necessary. I suspect M2-Planet/Mesoplanet could build a nearly stock upstream pnut-exe as pnut is already written in a fairly conservative dialect of C. I also didn't port the sh backend, but I might try and re-add it at some point.

I recently removed the use of structs from pnut's code, not sure how much this helps with M2-Planet's support. I'm also considering removing all malloc from the code to further reduce the level of C language support required.

update pnut_js to avoid the use of mmap as builder-hex0 does not support mmap

I'm not sure I understand this part. pnut-exe doesn't use mmap but implements it with a syscall, which should be independent from builder-hex0? Or is builder-hex0 a minimal operating system without the mmap syscall?

Before pnut-exe started depending on mmap, all globals were allocated on the stack. That worked until we needed larger statically allocated arrays, in particular for the code buffer for TCC. Fortunately, I just made the code generator one-pass and so the code buffer only needs to be a few kilobytes instead of megabytes so the globals should fit on a 8MB stack. The same could be done for the malloc buffer, since only path strings are allocated dynamically.

@stikonas
Copy link
Collaborator

M2-Planet support structs. And it actually got lots of new features: see https://github.com/oriansj/stage0-posix/blob/master/CHANGELOG.org for recent changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants