Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

CodeGate doesn't distinguish Python built-in modules and external packages #518

Closed
danbarr opened this issue Jan 8, 2025 · 8 comments
Closed
Assignees
Labels

Comments

@danbarr
Copy link
Contributor

danbarr commented Jan 8, 2025

Describe the issue

CodeGate isn't aware of the built-in Python modules, and may treat imports of these as references to external packages.

The specific case I've encountered is hashlib - at one time it was an external package and so exists in PyPI and thus also in our data set, but the external package was archived and moved to built-in ages ago. When CodeGate encounters import hashlib in code, it finds the archived package in the vector DB, and reports it as archived/deprecated.

Insight report - https://www.insight.stacklok.com/report/pypi/hashlib
PyPI entry - https://pypi.org/project/hashlib/20081119/

CodeGate behavior:
Image

Steps to Reproduce

Reference the app.py file from the codegate-demonstration repo using Copilot or Continue chat.

Operating System

MacOS (Arm)

IDE and Version

VS Code 1.96.2

Extension and Version

Any

Provider

GitHub Copilot

Model

Any

Logs

2025-01-08T21:19:27.008Z [debug    ] Found matching packages in sqlite-vec database matched_packages=['hashlib (crates)', 'hashlib (pypi)', 'invokehttp (pypi)'] module=codegate pathname=/app/src/codegate/pipeline/codegate_context_retriever/codegate.py
2025-01-08T21:19:27.008Z [debug    ] Final context message          context_message=Context: hashlib is a Rust package available on Crates ecosystem.  However, this package is found to be archived and no longer maintained. For additional information refer to https://www.insight.stacklok.com/report/crates/hashlib - Package offers this functionality: Provide various hash algorithms under a same abstraction layer.
hashlib is a Python package available on PyPI ecosystem.  However, this package is found to be deprecated and no longer recommended for use. For additional information refer to https://www.insight.stacklok.com/report/pypi/hashlib - Package offers this functionality: Secure hash and message digest algorithm library

Additional Context

No response

@danbarr
Copy link
Contributor Author

danbarr commented Jan 8, 2025

There's a potential secondary issue here too, where CodeGate is reporting this as both a Crates and PyPI package even though this is a Python file, shall I open a separate issue for this?

Image

@lukehinds
Copy link

@ptelang is this covered by #475 ?

@lukehinds
Copy link

@ptelang retest

@ptelang
Copy link
Contributor

ptelang commented Jan 13, 2025

There's a potential secondary issue here too, where CodeGate is reporting this as both a Crates and PyPI package even though this is a Python file, shall I open a separate issue for this?

This issue is fixed in the latest version by this PR.

@ptelang
Copy link
Contributor

ptelang commented Jan 13, 2025

Currently, Codegate cannot identify libraries like hashlib which were external earlier but are now built into python.

We can address this issue when the projects functionality is implemented. Codegate can then read the dependency files (e.g. requirements.tx, pyproject.toml, etc.) to detect cases like hashlib and prevent the false positive.

@lukehinds
Copy link

@ptelang this is fixed now?

@ptelang
Copy link
Contributor

ptelang commented Feb 3, 2025

@lukehinds, Part of this issue was fixed.

"hashlib" getting raised as a bad package is a corner case and is very rare. So, I think we can close this.

@jhrozek
Copy link
Contributor

jhrozek commented Feb 17, 2025

closing per request by @ptelang

@jhrozek jhrozek closed this as completed Feb 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants