This repository contains the code to reproduce all figures from the Iron Mind manuscript. The repository provides Python scripts to generate publication-quality figures from benchmark optimization data across six chemical reaction datasets. The preprint can be found on arXiv: https://arxiv.org/abs/2509.00103
This work comes with a website for users to test out both the LLM and BO optimization strategies on the benchmark datasets. Additionally, we are excited to offer humans the opportunity to conduct optimization campaigns on the datasets. You can access the website here.
computed_descriptors/- Descriptors used for Bayesian optimization methodsdescriptors/- Code to reproduce descriptorsfigures/- Python scripts for generating manuscript figureshistograms/- Histogram plots showing objective distributions for each datasetschematics/- Chemical reaction schematics for each dataset
The repository works with six chemical reaction optimization datasets:
- Buchwald-Hartwig - C-N coupling reactions (yield optimization)
- Suzuki-Miyaura A - Cross-coupling reactions (yield optimization)
- Suzuki-Miyaura B - Cross-coupling reactions (conversion optimization)
- Reductive Amination - Amine synthesis (conversion optimization)
- N-Alkylation/Deprotection - Two-step synthesis (yield optimization)
- Chan-Lam Coupling - C-N coupling reactions (multi-objective: desired vs undesired yield)
- Clone this repository:
git clone https://github.com/gomesgroup/iron-mind-public.git
cd iron-mind-public- Create a conda environment with required dependencies:
conda create -n iron-mind-figures python=3.10
conda activate iron-mind-figures- Install the required packages:
pip install git+https://github.com/gomesgroup/olympus.git
pip install pandas numpy matplotlib seaborn scikit-learn plotly scipy- Setup to save plotly figures:
pip install kaleido
plotly_get_chromeThe benchmark optimization data used to generate these figures is available on Hugging Face:
pip install huggingface-hub
hf auth login
hf download gomesgroup/iron-mind-data runs.zip --repo-type dataset --local-dir .
unzip runs.zipThis will produce the runs/ directory in your current working directory. Use this path when generating figures.
Each figure script in the figures/ directory can be run independently:
cd figures/
python figure_2.py
python figure_3.py
python figure_5_S12.py
...Some figure scripts require the path to the runs/ directory, be sure to provide the absolute path, opposed to the relative path.
To generate all figures:
bash generate_all_figures.sh <path_to_runs>The path_to_runs must be an absolute path.
Generated figures are saved to figures/pngs/ directory.
The descriptors used for Bayesian optimization can be found in computed_descriptors.
If you use this code or data, please cite our manuscript:
@article{macknight2025iron,
title={Pre-trained knowledge elevates large language models beyond traditional chemical reaction optimizers},
author={MacKnight, Robert and Regio, Jose Emilio and Ethier, Jeffrey G. and Baldwin, Luke A. and Gomes, Gabe},
journal={arXiv preprint arXiv:2025.xxxxx},
year={2025}
}