This project is a repro to help diagnose the large (20k+ docs) Docusaurus failed build.
The issues was reported as:
The builds were attempted on a macOS with 32 GB of RAM.
The problem is a huge memory consumption, with the machine starting to swap and eventually the build hanging or crashing.
The test is a bit extreme, trying to build a large site (the LLVM reference pages).
To run the build, the only required steps are the usual
npm install
npm run build
in the website
folder.
The steps below are for completeness, and document how to generate the Doxygen documentation and convert it to Docusaurus MD.
The issue seems related to the very large number of files, some of them very large.
After changing the converter to generate MD files instead of MDX, and adding a configuration option to disable the program listing, it was possible to build the site locally, but the memory usage is not reasonable, it peeked more than 74 GB, going deep into swap, which slowed things considerably.
The original LLVM reference web is:
The LLVM documentation is in the main LLVM repo. It can be downloaded either by cloning the Git, or by running the provided script:
mkdir web-llvm.git
cd web-llvm.git
curl -L https://raw.githubusercontent.com/llvm/llvm-project/refs/heads/main/llvm/utils/release/build-docs.sh -o build-docs.sh
bash build-docs.sh -release 20.1.6 -no-sphinx -no-doxygen
This script downloads the archive with the requested LLVM source code (2GB+, 160K+ files).
For this test, the Doxygen configuration needs some small adjustments,
especially CASE_SENSE_NAMES=SYSTEM
if the build runs on macOS,
and EXTRACT_ANON_NSPACES=YES
, to avoid some issues with anonymous namespaces.
cd web-llvm.git
find llvm-project -name doxygen.cfg.in \
-print \
-exec sed -i.bak \
-e 's|GENERATE_XML = NO|GENERATE_XML = YES|' \
-e 's|CLASS_DIAGRAMS = YES|CLASS_DIAGRAMS = NO|' \
-e 's|CASE_SENSE_NAMES = YES|CASE_SENSE_NAMES = SYSTEM|' \
-e 's|HAVE_DOT = YES|HAVE_DOT = NO|' \
-e 's|EXTRACT_ANON_NSPACES = NO|EXTRACT_ANON_NSPACES = YES|' \
-e 's|LOOKUP_CACHE_SIZE = 4|LOOKUP_CACHE_SIZE = 5|' \
'{}' ';'
The actual Doxygen build is performed by the same script. The prerequisites are: cmake, ninja, doxygen.
cd web-llvm.git
bash -x build-docs.sh -srcdir llvm-project/llvm -no-sphinx
The build runs multiple steps in several folders and generates a
large docs-build
folder (7G+, 135K+ files).
Note: the script has a small bug, it tries to run the Sphinx step, although instructed not to do so.
The html
and xml
output folders are in docs-build/docs/doxygen
.
The original documentation can be viewed directly with a browser, by
opening the docs-build/docs/doxygen/html/index.html
file.
The xml files are used to generate the Docusaurus MDX files.
The Docusaurus configuration is created with 3.8.1.
npx [email protected] website classic --typescript
The MD files were created with
doxygen2docusaurus
,
a CLI tool to generate MD docs from Doxygen XML files.
To install it, run:
(cd website; npm install @xpack/doxygen2docusaurus --save-dev)
Add the new command to website/package.json
npm scripts:
"scripts": {
"convert-doxygen": "node --max-old-space-size=8192 --stack-size=2048 ./node_modules/.bin/doxygen2docusaurus",
}
Please note the conversion requires most of the objects in memory, and for this large site the heap and stack must be increased, otherwise node will run out of memory,
To run the conversion:
(cd website; npm run convert-doxygen)
On my Mac this step takes about 7 minutes; it reports some warnings and errors, but they are not relevant for this test.
The generated MD files are in website/docs/api
and the JSON files with the custom sidebar and menu are in website
.
More details in the project README.
The initial attempts to build the Docusaurus site failed; a new attempt with the new faster plugin was made.
(cd website; npm install @docusaurus/faster)
and an addition to website/docusaurus-config.ts
future: {
v4: {
removeLegacyPostBuildHeadAttribute: true
},
experimental_faster: true,
},
Another build attempt was done with the concatenateModules
property disabled:
plugins: [
// ...
function disableExpensiveBundlerOptimizationPlugin() {
return {
name: "disable-expensive-bundler-optimizations",
configureWebpack(_config, isServer) {
return {
optimization: {
concatenateModules: false,
},
};
},
};
},
],
The build apparently went farther, but it also failed.
- To conserve considerable space, the original LLVM files and the generated documentation are not included in this project.
- The generated MD files are not final and may require further refinement (suggestions are welcome!).