-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Added Metadata filtering, scaffolding for all, fully implemented for Postgres (pgvector) and Neo4j #2187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…querying - Implement custom metadata insertion as node properties during file upload. - Add basic metadata filtering functionality to query API --NOTE: While the base.py file has been modified, the base implementation is incomplete and untested. Only Neo4j database has been properly implemented and tested. WIP: Query API is temporarily mocked for debugging. Full implementation with complex AND/OR filtering capabilities is in development. # Conflicts: # lightrag/base.py # lightrag/lightrag.py # lightrag/operate.py
Added metadata filter dataclass for serializing and deserializing complex filter to json dict, added node filtering based on metadata
Added functioning (needs testing) metadata filtering on chunks for query. Fully implemented only on Postgres with pgvector and Neo4j
Added metadata management for chunks in querying for all vdb, ONLY posgres with pgvector has been fully implemented
…s for metadata filtering to optimize speed
|
Looking forward this feature. Thank you so much the effort. |
@duynt88 Thank you! It's being discussed because of a technical potential issue in data reliability, but if you pull the fork it's already working. If you use a large document base the issue is less and less prominent, we've reached >80% successful unstructured data extraction (e. g. who's the executive manager for the xyz store, is the X9000 certification needed for this procedure etc etc...) from a large codebase with the rag only, without any guardrailing for the specific datum extracted save for the metadata filtering to restrict the chunk pool to the documents that we know might contain the datum. Give it a spin! |
Description
Added custom metadata on chunks and nodes with possibility of filtering in query to narrow the field of the knowledge base to pull context from, to both enhance precision and speed.
Metadata are stored as a json string and indexe
Metadata_filter class supports operators for AND, NOT, OR clauses and nested metadata_filter classes for chained or hierarchical filters, [ ... ] arrays for multiple possible values for a single metadata key.
I will gladly cooperate for bugfixing and further development.
Related Issues
As requested and discussed into question/issue #1985
Changes Made
Checklist
Additional Notes
[Add any additional notes or context for the reviewer(s).]