Fix: Ensure Storage Consistency When Creating Implicit Nodes from Relationships #2262
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix: Resolved chunk storage inconsistency caused by implicit node in document indexing and deletion
Problem
This PR addresses two critical data consistency bugs related to storage synchronization:
_merge_edges_then_upsertand_rebuild_single_relationship, if source or target nodes did not exist, "implicit nodes" were created. These nodes were correctly added to the mainknowledge_graph_instbut were not propagated to theentity_vdb(vector DB) or theentity_chunks_storage(chunk tracking).entity_chunks_storagecould have fewer nodes than theentity_vdb, and tracking information for these implicit nodes was missing, leading to potential query failures or incomplete results.adelete_by_doc_id(), the system properly removed entities and relationships from the main graph storage and vector databases. However, it failed to clean up the corresponding entries in the chunk tracking storages (entity_chunksandrelation_chunks).Here is the "Solution" section for your PR, written in English.
Solution
This PR implements the following changes to ensure data consistency across all storage layers:
1. Synchronized Implicit Node Creation:
_merge_edges_then_upsertto accept theentity_chunks_storageparameter._rebuild_single_relationshipto accept bothentities_vdbandentity_chunks_storageparameters.rebuild_knowledge_from_chunksandmerge_nodes_and_edges) to pass the new storage parameters down.2. Implemented Proper Deletion Cleanup:
entity_chunksare now also deleted.relation_chunksare now also deleted using properly formatted storage keys.Backward Compatibility
✅ Fully backward compatible - new parameters are optional with
Nonedefaults