A CLI utility for extracting structured data from unstructured text. Uses LangExtract, an open source Python library.
Examples are ready to run out of the box after downloading and adding your API key (see below for details). An example output for the Shakespearean text example:
Input | Output |
---|---|
![]() |
![]() |
LangExtract has a built-in visualizer. Scrolling through the document, extracted data is displayed and the associated text highlighted.
shakespeare.mp4
git clone [email protected]/msyvr/extractor
This project uses LangExtract together with an LLM, and custom model providers can be added via a lightweight plug-in system. The example uses an economical OpenAI model.
For api keys, add a .env
file and ensure that it's included in .gitignore to avoid exposing keys.
Give uv
permission to access .env
:
export UV_ENV_FILE=".env"
Run the Shakespeare text example:
uv run main.py
An initial experiment with Google's LangExtract.
Not included here (yet) but, ultimately, build graph visualizations with extracted entities as nodes and extracted relationships as edges.
This example uses gpt-5-nano
which trades off significant quality for lower cost. It's worth experimenting to identify an LLM to balance quality/cost.