Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

GGrassia · 2025-10-09T10:51:52Z

Description

Added custom metadata on chunks and nodes with possibility of filtering in query to narrow the field of the knowledge base to pull context from, to both enhance precision and speed.
Metadata are stored as a json string and indexe
Metadata_filter class supports operators for AND, NOT, OR clauses and nested metadata_filter classes for chained or hierarchical filters, [ ... ] arrays for multiple possible values for a single metadata key.
I will gladly cooperate for bugfixing and further development.

Related Issues

As requested and discussed into question/issue #1985

Changes Made

Added Pydantic metadata_filter class
Added metadata_filter class to all base implementations of query for chunks
Added metadata management in chunk writing for Postgres
Added metadata as properties on nodes for Neo4j
Added metadata filter building for postgres_impl and updated queries for chunks, entities and relations to allow filtering

Checklist

Changes tested locally - (fully working in prod for our specific solution!)
Code reviewed
Documentation updated (if necessary)
Unit tests added (if applicable)

Additional Notes

[Add any additional notes or context for the reviewer(s).]

…querying - Implement custom metadata insertion as node properties during file upload. - Add basic metadata filtering functionality to query API --NOTE: While the base.py file has been modified, the base implementation is incomplete and untested. Only Neo4j database has been properly implemented and tested. WIP: Query API is temporarily mocked for debugging. Full implementation with complex AND/OR filtering capabilities is in development. # Conflicts: # lightrag/base.py # lightrag/lightrag.py # lightrag/operate.py

Added metadata filter dataclass for serializing and deserializing complex filter to json dict, added node filtering based on metadata

Added functioning (needs testing) metadata filtering on chunks for query. Fully implemented only on Postgres with pgvector and Neo4j

Added metadata management for chunks in querying for all vdb, ONLY posgres with pgvector has been fully implemented

… work

…s for metadata filtering to optimize speed

…ations for the functionality

duynt88 · 2025-11-10T02:26:07Z

Looking forward this feature. Thank you so much the effort.

GGrassia · 2025-11-11T08:50:54Z

Looking forward this feature. Thank you so much the effort.

@duynt88 Thank you! It's being discussed because of a technical potential issue in data reliability, but if you pull the fork it's already working. If you use a large document base the issue is less and less prominent, we've reached >80% successful unstructured data extraction (e. g. who's the executive manager for the xyz store, is the X9000 certification needed for this procedure etc etc...) from a large codebase with the rag only, without any guardrailing for the specific datum extracted save for the metadata filtering to restrict the chunk pool to the documents that we know might contain the datum. Give it a spin!

Giulio Grassia and others added 15 commits September 25, 2025 15:37

feat (metadata): added metadata filter in query

0c721fa

Added metadata filter dataclass for serializing and deserializing complex filter to json dict, added node filtering based on metadata

fix (metadata): Corrected metadata management in enqueued documents

40afb04

feat (metadata filter): added metadata filtering

d0fba28

Added functioning (needs testing) metadata filtering on chunks for query. Fully implemented only on Postgres with pgvector and Neo4j

feat (metadata): Added metadata parameter in query

2728bb4

Added metadata management for chunks in querying for all vdb, ONLY posgres with pgvector has been fully implemented

fix (metadata): fixed not working postgres queries, performance needs…

e04cd3c

… work

feat (metadata): added IN clause management

e383879

feat (metadata) WIP: added metadata GIN index and modified sql querie…

b5cc842

…s for metadata filtering to optimize speed

Merge remote-tracking branch 'upstream/main'

4535af4

fix (operate): commented duplicate function

c35f74b

fix (metadata): added metadata as named parameter

177ec23

fix (postgres query): fixed metadata filtering for postgres pg queries

f4c2823

perf (metadata): optimized metadata query with gin indexes for postgres

a57d4ec

Merge remote-tracking branch 'upstream/main'

bb4d818

feat (metadata postgres): added logic for IN clauses on operands

bdb1ae0

GGrassia mentioned this pull request Oct 10, 2025

[Feature Request]: Add page number metadata to chunks for citation #2142

Open

2 tasks

GGrassia mentioned this pull request Oct 30, 2025

Can we temporarily enable/disable documents in RAG? #2285

Open

2 tasks

GGrassia added 2 commits October 30, 2025 15:15

fix (document_queue): fixed silent fail when requeueing

166bdf7

docs (metadata): Added Metadata_Filtering.md with examples and explan…

cd664de

…ations for the functionality

danielaskdd added the discuss label Nov 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

GGrassia commented Oct 9, 2025 •

edited

Loading

Uh oh!

duynt88 commented Nov 10, 2025

Uh oh!

GGrassia commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

Are you sure you want to change the base?

Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187

Conversation

GGrassia commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Changes Made

Checklist

Additional Notes

Uh oh!

duynt88 commented Nov 10, 2025

Uh oh!

GGrassia commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GGrassia commented Oct 9, 2025 •

edited

Loading