Hotfix: Preserve ordering in get_by_ids methods across all storage implementations #2195
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Preserve ordering in get_by_ids methods across all storage implementations
🎯 Problem
The
get_by_idsfunction in certain storage implementations returns results in an order that does not match the input IDs list, causing a misalignment between retrieved text blocks and their corresponding IDs. This issue affects the correctness of data returned by theaquery_datafunction and the/aquery_dataAPI endpoint.📝 Changes
Modified
get_by_idsimplementations in 8 storage backends to preserve input order and handle missing IDs consistently:Modified Files:
lightrag/kg/deprecated/chroma_impl.pylightrag/kg/json_doc_status_impl.pylightrag/kg/milvus_impl.pylightrag/kg/mongo_impl.pylightrag/kg/nano_vector_db_impl.pylightrag/kg/postgres_impl.pylightrag/kg/qdrant_impl.pylightrag/kg/redis_impl.pylightrag/kg/faiss_impl.pyImplementation Pattern:
All implementations now follow a consistent 3-step pattern:
API Contract Change
Before:
After:
Impact on Consumers
✅ Compatible:
for i, result in enumerate(results)if results[i]: process(results[i])❌ Requires Updates:
for r in results: r['field']len(results) == len(found_items)Existing Code Compatibility
All 4 existing call sites in
lightrag/operate.pyalready have proper None checks:_get_cached_extraction_results(line 1304):if chunk_data and isinstance(chunk_data, dict)_get_cached_extraction_results(line 1317):if cache_entry is not None_find_related_text_unit_from_entities(line 3959):if chunk_data is not None and "content" in chunk_data_find_related_text_unit_from_relations(line 4173):if chunk_data is not None and "content" in chunk_dataConclusion: This change is backward compatible with existing codebase.
🎯 Benefits
idsorder exactlyNonevalues for missing IDs instead of silent omission| Noneunion type for better IDE support