Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

[Chore]: Monitor presidio-analyzer releases #1054

Open
aponcedeleonch opened this issue Feb 14, 2025 · 3 comments
Open

[Chore]: Monitor presidio-analyzer releases #1054

aponcedeleonch opened this issue Feb 14, 2025 · 3 comments

Comments

@aponcedeleonch
Copy link
Contributor

aponcedeleonch commented Feb 14, 2025

Description

We're using presidio-analyzer==2.2.357 (latest release) for our PII pipeline step. There's a known bug with presidio-analyzer using numpy>=2.0.0. The workaround is to keep pinned numpy==1.26.4. The bug on presidio-analyzer seems that was caused by a bug in thinc which is fixed. We need to contribute upstream with a patch in presidio-analyzer or monitor their releases to be able to bump numpy

Additional Context

presidio-analyzer==2.2.357 dependency tree.

presidio-analyzer 2.2.357 Presidio Analyzer package
├── phonenumbers >=8.12,<9.0.0
├── pyyaml *
├── regex *
├── spacy >=3.4.4,<3.7.0 || >3.7.0,<4.0.0
│   ├── catalogue >=2.0.6,<2.1.0
│   ├── cymem >=2.0.2,<2.1.0
│   ├── jinja2 *
│   │   └── markupsafe >=2.0
│   ├── langcodes >=3.2.0,<4.0.0
│   │   └── language-data >=1.2
│   │       └── marisa-trie >=1.1.0
│   │           └── setuptools *
│   ├── murmurhash >=0.28.0,<1.1.0
│   ├── numpy >=1.19.0
│   ├── packaging >=20.0
│   ├── preshed >=3.0.2,<3.1.0
│   │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
│   │   └── murmurhash >=0.28.0,<1.1.0 (circular dependency aborted here)
│   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<3.0.0
│   │   ├── annotated-types >=0.6.0
│   │   ├── pydantic-core 2.27.2
│   │   │   └── typing-extensions >=4.6.0,<4.7.0 || >4.7.0
│   │   └── typing-extensions >=4.12.2 (circular dependency aborted here)
│   ├── requests >=2.13.0,<3.0.0
│   │   ├── certifi >=2017.4.17
│   │   ├── charset-normalizer >=2,<4
│   │   ├── idna >=2.5,<4
│   │   └── urllib3 >=1.21.1,<3
│   ├── setuptools * (circular dependency aborted here)
│   ├── spacy-legacy >=3.0.11,<3.1.0
│   ├── spacy-loggers >=1.0.0,<2.0.0
│   ├── srsly >=2.4.3,<3.0.0
│   │   └── catalogue >=2.0.3,<2.1.0 (circular dependency aborted here)
│   ├── thinc >=8.2.2,<8.3.0
│   │   ├── blis >=0.7.8,<0.8.0
│   │   │   └── numpy >=1.19.0 (circular dependency aborted here)
│   │   ├── catalogue >=2.0.4,<2.1.0 (circular dependency aborted here)
│   │   ├── confection >=0.0.1,<1.0.0
│   │   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<3.0.0 (circular dependency aborted here)
│   │   │   └── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
│   │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
│   │   ├── murmurhash >=1.0.2,<1.1.0 (circular dependency aborted here)
│   │   ├── numpy >=1.19.0,<2.0.0 (circular dependency aborted here)
│   │   ├── packaging >=20.0 (circular dependency aborted here)
│   │   ├── preshed >=3.0.2,<3.1.0 (circular dependency aborted here)
│   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<3.0.0 (circular dependency aborted here)
│   │   ├── setuptools * (circular dependency aborted here)
│   │   ├── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
│   │   └── wasabi >=0.8.1,<1.2.0
│   │       └── colorama >=0.4.6
@aponcedeleonch aponcedeleonch changed the title [Chore]: Monitor presidio releases to be able to bump numpy [Chore]: Monitor presidio releases to bump numpy Feb 14, 2025
@aponcedeleonch aponcedeleonch changed the title [Chore]: Monitor presidio releases to bump numpy [Chore]: Monitor presidio-analyzer releases to bump numpy Feb 14, 2025
@aponcedeleonch
Copy link
Contributor Author

aponcedeleonch commented Feb 14, 2025

Incidentally spacy was also brought in by presidio-analyzer. spacy brings thinc which brings blis (see the dependency tree above). There's a bug in blis==1.2.0 when building in arm which we hit and solved in #1047 . The workaround is capping spacy<3.8.0 Whenever we bump presidio-analyzer we need to be careful with its sub-dependencies making sure nothing breaks. In the meantime, we won't be able to bump spacy to avoid also bumping blis

@aponcedeleonch aponcedeleonch changed the title [Chore]: Monitor presidio-analyzer releases to bump numpy [Chore]: Monitor presidio-analyzer releases Feb 17, 2025
@aponcedeleonch
Copy link
Contributor Author

presidio-analyzer is also preventing us of having support on Python 3.13. We pinned Python to 3.12 in #1009

@omri374
Copy link

omri374 commented Mar 26, 2025

Please check if the issue is resolved. thinc, spacy and presidio-analyzer are updated. microsoft/presidio#1473

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants