Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

sqlite-vec vectorization database #438

Merged
merged 18 commits into from
Dec 24, 2024
Merged

sqlite-vec vectorization database #438

merged 18 commits into from
Dec 24, 2024

Conversation

lukehinds
Copy link

@lukehinds lukehinds commented Dec 21, 2024

Migrate to sqlite-vec

This PR migrates from weaviate to sqlite-vec.

I tried to keep the logic flow the same as before. Initial tests show prompt augmented correctly after matching a search.

image

How to test

Load the packages into the database:

poetry run python scripts/import_packages.py --jsonl-dir=data   

Select a package from data and test in chat " is malicious-crates-dummy from crates safe to use?"

You should then see:

2024-12-24T17:13:26.3dZ [debug    ] Found matching packages in sqlite-vec database matched_packages=['malicious-crates-dummy (crates)'] module=codegate pathname=/Users/lhinds/repos/stacklok/codegate-repos/codegate/src/codegate/pipeline/codegate_context_retriever/codegate.py

Closes: #437

Luke Hinds added 2 commits December 21, 2024 11:58
This PR migrates from weaviate to sqlite-vec.

I tried to keep the logic flow the same as before. Initial tests
show prompt augmented correctly after matching a search.
Luke Hinds and others added 15 commits December 22, 2024 10:06
Bumps [click](https://github.com/pallets/click) from 8.1.7 to 8.1.8.
- [Release notes](https://github.com/pallets/click/releases)
- [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst)
- [Commits](pallets/click@8.1.7...8.1.8)

---
updated-dependencies:
- dependency-name: click
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
This PR migrates from weaviate to sqlite-vec.

I tried to keep the logic flow the same as before. Initial tests
show prompt augmented correctly after matching a search.
@lukehinds lukehinds marked this pull request as ready for review December 24, 2024 18:46
query_sql = """
WITH distances AS (
SELECT name, type, status, description,
vss_distance(embedding, ?) as distance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use cosine distance here. I could not find documentation on vss_distance function, and not sure which kind of distance it computes. I see a vec_distance_cosine function which we can use instead.

https://alexgarcia.xyz/sqlite-vec/api-reference.html#vec_distance_cosine

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! well captured , as per slack: I started out with sqlite-vss and then got a warning it was deprecated, so flipped to the sqlite-vec, but left the old similarity distance call in place which weirdly still worked

ptelang
ptelang previously approved these changes Dec 24, 2024
@lukehinds lukehinds dismissed ptelang’s stale review December 24, 2024 23:21

The merge-base changed after approval.

ptelang
ptelang previously approved these changes Dec 24, 2024
Copy link
Contributor

@ptelang ptelang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@lukehinds lukehinds dismissed ptelang’s stale review December 24, 2024 23:28

The merge-base changed after approval.

@lukehinds lukehinds merged commit 3de0dbf into main Dec 24, 2024
3 checks passed
@lukehinds lukehinds deleted the sqlite-vec branch December 24, 2024 23:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review Weaviate as a vectorDB
2 participants