-
Notifications
You must be signed in to change notification settings - Fork 54
implement GetAllDocuments() #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
philippgille
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Dennis 👋, Thanks for contributing!
I think the method is useful and makes sense 👍.
But there's a DB.ListCollections(), so for consistency I'd prefer to name the new method Collection.ListDocuments().
And can you please move it between the Collection.GetByID() and Collection.Delete()?
Thanks!
|
Hi @philippgille, your wished changes make absolute sense and are applied now. ✌️ |
philippgille
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, sorry for the long delay!
First of all, thanks for implementing the requested changes! 🙇♂️
I had another more thorough look and found some more things that can be improved.
Due to my delayed review I'd understand if improving the PR doesn't fit your schedule anymore, so just let me know if you prefer me to make those changes on my own.
| // The returned documents are a copy of the original documents, so they can be safely | ||
| // modified without affecting the collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slice is new and can be modified without affecting the internal c.documents, but the documents themselves are not full copies. When you do x := y, then y's simple fields (int, string) are entirely new, but maps and slices are only_shallow_ copies.
Demonstration: https://go.dev/play/p/0OccI4ibtS2
See the above GetByID where the Metadata and Embedding fields are cloned separately to create an entirely new document.
So here we have two options:
- Change the Godoc to clarify that the documents are shallow copies, and only the slice is new. This still allows the receiver to work with the slice, like iterating over it and reading the documents, without concurrency issues during regular operations. For example chromem-go can still add new documents to its
c.Documentsmap, or delete them, and it doesn't affect the returned slice. Here's an example in chromem-go where something similar is done:Lines 517 to 522 in 8311eb0
// The returned map is a copy of the internal map, so it's safe to directly modify // the map itself. Direct modifications of the map won't reflect on the DB's map. // To do that use the DB's methods like [DB.CreateCollection] and [DB.DeleteCollection]. // The map is not an entirely deep clone, so the collections themselves are still // the original ones. Any methods on the collections like Add() for adding documents // will be reflected on the DB's collections and are concurrency-safe. - Or create a deep copy of documents. This can either be done by calling the
GetByIDfor each document, or by copying the code from that method. The former leads to less code, but one extra operation per document (thec.Documentslookup).
| ids := []string{"1", "2", "3", "4"} | ||
| metadatas := []map[string]string{{"foo": "bar"}, {"a": "b"}, {"foo": "bar"}, {"e": "f"}} | ||
| contents := []string{"hello world", "hallo welt", "bonjour le monde", "hola mundo"} | ||
| c.Add(context.Background(), ids, nil, metadatas, contents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the returned error should be checked
| for _, doc := range docs { | ||
| if doc.Content == "hello world" { | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the test doesn't assert whether the content was found or not. You can introduce a new variable found := false before the loop, set it found = true just before the break, and after the loop assert that its value is true.
|
There was a later PR from another contributor, which I think supersedes this: #118 Can you check if that enables you to do the merge of collections? |
Hey @philippgille, thanks for this great package!
I've had a hard time finding out that there is no possibility to kinda merge 2 existing collections. I appreciate your focus on staying as a simple package (as I've read in other Issues and Pull requests), so I avoided to extend the Import.. functions with
enableMergeattributes - but instead implemented the simplest approach I could come up with: getting all existing documents of a collection. This way the end-user (or -developer?) is at least able to fetch all documents and import them into another collection on their own way.I'm interested in your feedback and leave behind some happy greetings from Hamburg