-
Notifications
You must be signed in to change notification settings - Fork 3
feat: add embeddings service for codebase analysis and similarity search #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🚀 Previews available on pkg.vc!📦
|
…onment configuration
…mbedding generation and document retrieval
… and remove OpenAI dependency
…transformers version
… jinaai embeddings
…ings service and tests
src/services/embeddings.ts
Outdated
"node_modules", | ||
".git", | ||
"dist", | ||
"build", | ||
"coverage", | ||
".next", | ||
"tmp", | ||
".cache", | ||
"huggingface_hub", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should merge the list of files we define + the files from the .gitignore
lets exclude all the open runtime specific scripts like build.sh etc etc. in our own list as we dont need them indexed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of them exist in the helpers
folder, so i added that to the list as well, just worried if this might be bad for actual codebases..
also embeddings will be generated in the artifact folder, where these files should not exist anyway
constructor( | ||
synapse: Synapse, | ||
workDir: string, | ||
modelName: string = "jinaai/jina-embeddings-v2-base-code", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accuracy is really important here and so are the quality of the embeddings. we should create an adapter system that allows us to use openAI embeddings or our own custom embeddings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also like to see some results with the open AI embeddings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i agree, will work on adding an adapter.
as for openai embeddings, thats what i had done before but it would require exposing the open ai api key in the container, which we should definitely not do as it might get exposed to the user
…noring logic in embeddings service
What does this PR do?
adds embeddings service.
Test Plan
Related PRs and Issues
Have you read the Contributing Guidelines on issues?
yes.