This repository provides hands‑on Jupyter‑notebook examples that show how to turn raw scientific papers into structured datasets using large‑language‑model (LLM) workflows.
data_extraction_tutorial_1.ipynb
– selecting & parsing example papersdata_extraction_tutorial_2.ipynb
– crafting prompts and few‑shot examples for LLM extractiondata_extraction_tutorial_3.ipynb
– post‑processing & exporting JSON/CSV
# clone and enter the repo
git clone https://github.com/lamalab-org/data-extraction-tutorial.git
cd data-extraction-tutorial
# set up a fresh environment (Python ≥3.10)
python -m venv .venv
source .venv/bin/activate
# install dependencies
pip install -r requirements.txt
# launch the notebooks
jupyter lab
- 📄 Data extraction review: From Text to Insight: Large‑Language‑Model Workflows for Materials‑Science Data Extraction, Chem. Soc. Rev. 54 (2025), 6910‑6953. [link]
- 📖 MatExtract – hands-on online book: https://matextract.pub