Skip to content

lamalab-org/data-extraction-tutorial

Repository files navigation

Data Extraction Tutorial

This repository provides hands‑on Jupyter‑notebook examples that show how to turn raw scientific papers into structured datasets using large‑language‑model (LLM) workflows.

Contents

  • data_extraction_tutorial_1.ipynb – selecting & parsing example papers
  • data_extraction_tutorial_2.ipynb – crafting prompts and few‑shot examples for LLM extraction
  • data_extraction_tutorial_3.ipynb – post‑processing & exporting JSON/CSV

Quick start

# clone and enter the repo
git clone https://github.com/lamalab-org/data-extraction-tutorial.git
cd data-extraction-tutorial

# set up a fresh environment (Python ≥3.10)
python -m venv .venv
source .venv/bin/activate

# install dependencies
pip install -r requirements.txt

# launch the notebooks
jupyter lab

Learn more

  • 📄 Data extraction review: From Text to Insight: Large‑Language‑Model Workflows for Materials‑Science Data Extraction, Chem. Soc. Rev. 54 (2025), 6910‑6953. [link]
  • 📖 MatExtract – hands-on online book: https://matextract.pub

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •