Extractor

A CLI utility for extracting structured data from unstructured text. Uses LangExtract, an open source Python library.

Examples are ready to run out of the box after downloading and adding your API key (see below for details). An example output for the Shakespearean text example:

Input	Output

LangExtract has a built-in visualizer. Scrolling through the document, extracted data is displayed and the associated text highlighted.

shakespeare.mp4

Use

git clone [email protected]/msyvr/extractor

This project uses LangExtract together with an LLM, and custom model providers can be added via a lightweight plug-in system. The example uses an economical OpenAI model.

For api keys, add a .env file and ensure that it's included in .gitignore to avoid exposing keys.

Give uv permission to access .env:

export UV_ENV_FILE=".env"

Run the Shakespeare text example:

uv run main.py

Why build this?

An initial experiment with Google's LangExtract.

Not included here (yet) but, ultimately, build graph visualizations with extracted entities as nodes and extracted relationships as edges.

LLM usage

This example uses gpt-5-nano which trades off significant quality for lower cost. It's worth experimenting to identify an LLM to balance quality/cost.

Dashboard

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
results		results
src		src
.DS_Store		.DS_Store
.env_example		.env_example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Extractor

Use

Why build this?

LLM usage

About

Uh oh!

Releases

Packages

Languages

msyvr/extractor

Folders and files

Latest commit

History

Repository files navigation

Extractor

Use

Why build this?

LLM usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages