Skip to content

msyvr/extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extractor

A CLI utility for extracting structured data from unstructured text. Uses LangExtract, an open source Python library.

Examples are ready to run out of the box after downloading and adding your API key (see below for details). An example output for the Shakespearean text example:

Input Output
quote json

LangExtract has a built-in visualizer. Scrolling through the document, extracted data is displayed and the associated text highlighted.

shakespeare.mp4

Use

git clone [email protected]/msyvr/extractor

This project uses LangExtract together with an LLM, and custom model providers can be added via a lightweight plug-in system. The example uses an economical OpenAI model.

For api keys, add a .env file and ensure that it's included in .gitignore to avoid exposing keys.

Give uv permission to access .env:

export UV_ENV_FILE=".env"

Run the Shakespeare text example:

uv run main.py

Why build this?

An initial experiment with Google's LangExtract.

Not included here (yet) but, ultimately, build graph visualizations with extracted entities as nodes and extracted relationships as edges.

LLM usage

This example uses gpt-5-nano which trades off significant quality for lower cost. It's worth experimenting to identify an LLM to balance quality/cost.

Dashboard

About

Framework for LangExtract: extract structured data from unstructured text.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published