Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

Obfuscate secrets before sending a snippet out for analysis #332

Merged
merged 3 commits into from
Dec 17, 2024

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Dec 13, 2024

Load signatures only once - This was manifesting in tests - if we kept loading signatures, they were being added to the list of signatures, which was causing double matches

Split out secret obfuscation into reusable classes - Instead of coding up the secret encryption directly in the Pipeline step, let's split it out into a class of its own based on its own. The actual method that changes the secret is pluggable, for encryption where we need to get the secret value back we use the method we had used in the pipeline step. For things like extracting packages from a code snippet where we don't need to retrieve the original value we just replace the secret with a fixed number of asterisks.

Obfuscate secrets in code snippet before the code extraction step - We use the previously added SecretsObfuscator to hide the secrets before passing them to an LLM.

@ptelang
Copy link
Contributor

ptelang commented Dec 13, 2024

Good idea! We can use the same class to obfuscate the secrets before persisting in the DB.

@jhrozek jhrozek changed the title Draft: Add reusable classes to aid in obfuscating secrets Obfuscate secrets before sending a snippet out for analysis Dec 14, 2024
@jhrozek jhrozek marked this pull request as ready for review December 14, 2024 13:30
@jhrozek
Copy link
Contributor Author

jhrozek commented Dec 14, 2024

This is now ready for review.

@aponcedeleonch could you please test if I rebased atop your changes to how secrets are stored in the DB and alerts? I hope I didn't break anything.

Copy link
Contributor

@aponcedeleonch aponcedeleonch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested and this is working. There's a conflict with main but besides that it looks good.

I like your idea of reporting the context of file the secret was coming from and the surrounding lines, I will add an issue for it.

This was manifesting in tests - if we kept loading signatures, they were
being added to the list of signatures, which was causing double matches
Instead of coding up the secret encryption directly in the Pipeline
step, let's split it out into a class of its own based on its own. The
actual method that changes the secret is pluggable, for encryption where
we need to get the secret value back we use the method we had used in
the pipeline step. For things like extracting packages from a code
snippet where we don't need to retrieve the original value we just
replace the secret with a fixed number of asterisks.
We use the previously added SecretsObfuscator to hide the secrets before
passing them to an LLM.
@jhrozek jhrozek merged commit bbdb8ae into stacklok:main Dec 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants