Releases · opensanctions/yente

06 Oct 12:41

pudo

v5.0.2

96a0e61

v5.0.2 Latest

Latest

Brings in updates to symbolic matching in nomenklatura and rigour, and tries to make single-word queries less horrible by forcing fuzzy matching on them. Clarify and raise for invalid catalog specifications.

What's Changed

build(deps): update click requirement from ~=8.2.0 to >=8.2,<8.4 by @dependabot[bot] in #895
build(deps): bump fastapi from 0.116.2 to 0.117.1 by @dependabot[bot] in #897
build(deps): bump anyio from 4.10.0 to 4.11.0 by @dependabot[bot] in #899
build(deps): bump aiocsv from 1.3.2 to 1.4.0 by @dependabot[bot] in #898
build(deps): bump uvicorn[standard] from 0.35.0 to 0.37.0 by @dependabot[bot] in #900
build(deps): bump fastapi from 0.117.1 to 0.118.0 by @dependabot[bot] in #901
build(deps): bump cryptography from 46.0.1 to 46.0.2 by @dependabot[bot] in #903
Log HTTP error URL explicitly by @jbothma in #906
Enable fuzzy search for single-word queries by @pudo in #907
Clarify multi catalog/dataset manifest configuration by @jbothma in #904

Full Changelog: v5.0.1...v5.0.2

Contributors

pudo, jbothma, and dependabot

Assets 2

23 Sep 14:29

leonhandreke

v5.0.1

3250024

v5.0.1

Just a minor patch release of yente, pulling in a new nomenklatura with some fixes to the new logic-v2 scoring algorithm. Thank you to @baldurh for the in-depth feedback and bug hunting! More feedback on this new scoring system (and all other aspects of yente is always appreciated - please get in touch here on GitHub or our Forum.

Note that this release also bumps cryptography from 45.0.7 to 46.0.1.

Full Changelog: v5.0.0...v5.0.1

Contributors

baldurh

Assets 2

15 Sep 15:00

leonhandreke

v5.0.0

5fc6b9a

v5.0.0

New logic-v2 matcher: We're including a new matching system, logic-v2. This new system reflects feedback from users of logic-v1 and introduces a more precise, explainable, and culturally-aware way to match names of people and companies. logic-v2 reduces false positives, is fully deterministic, improves cross-language and cross-script matching, runs fast, and provides detailed explanations of its decision-making path. Please be aware that we'll still be adjusting logic-v2 based on user feedback, so don't expect every scoring decision to be set in stone yet. We look forward to your feedback, be it as an issue report on our GitHub, or on our discussion forum or via the support team.
Field deprecations and renames:
- The matcher field in /match responses will be removed in a future version of yente. Equivalent information is available via the /algorithms endpoint.
- The /match endpoint response has gained a new explanations field that is a strict superset of features. In addition to the partial scores produced by each component of the matching system, it includes textual descriptions of the matching decisions from each of these subsystems. These descriptions are designed for display in analyst workbenches or can be passed to generative AI tools to help interpret screening alerts. The features field will be removed in a future version of yente.
- The cutoff parameter on the /match endpoint has been deprecated and will be removed in a future version of yente. If you care about low-scoring results being returned, please set the threshold parameter instead.
- The /algorithms endpoint has gained a new docs field that is a strict superset of features. In addition to a documentation of the algorithm features (yes, those featured in the explanations field in /match), it contains a list of configuration knobs to tune the algorithm to your individual needs. Currently, only logic-v2 offers these knobs.
- These algorithm configuration options are passed in a new config field on the /match request.
Locking mechanism to prevent concurrent reindexes: One of the most common issues with yente, both in our hosted deployment an for external users, was two re-index jobs stepping on each other's toes or overloading the Elastic backend, doing the same thing twice. Before we start a reindex, we now write a little lock to the Elastic backend so that other instances of yente who might be planning to do a reindex know that one is already in progress. Regardless, if you're running multiple instances of yente, you should still configure a separate re-indexing cronjob. See documentation on how to deploy yente for more information.
Audit log of index operations: A log of re-indexing and index cleanup operations is written to a special index in Elastic. This allows users to get an accurate record of what data became available when in their yente without digging through log messages. To read the audit log, simply run yente audit-log (or use another CSV viewer of your choice).
Google Cloud Logging compatible request logging: If you're running yente on Google Cloud Run, handy little badges with information about the HTTP request as well as a little button to filter logs from only this trace will show up in the Logs Explorer. Careful: the format of the log message logged for each request changed, so if you're parsing the JSON logged by yente, please update your infrastructure accordingly.
FollowTheMoney 4.0: This release pulls in an updated version of the data model underpinning yente. If you're ingesting custom data sources that you're generating yourself, please check out the release notes. The only change that may be notable for users of yente is probably the rename of CryptoWallet:managingExchange. Users relying on recent published OpenSanctions data can ignore this change.
Documentation moved to yente.followthemoney.tech: yente is part of the wider open source ecosystem around the FollowTheMoney data model, and the move of its documentation reflects this. The documentation is now part of the source repository and can be edited by anyone. Documentation is never perfect, and we welcome your PRs! The OpenSanctions website remains a good place to read about the data underpinning yente and our hosted API.
Celebrity-friendly scoring in /search: While /match is at the heart of what yente does best, the /search endpoint is what users usually hit first when they type in a search query on opensanctions.org. By deploying cutting-edge research from our search engine labs, we ensure the Putin ranks first when you type in "putin". Much wow!
The /updatez endpoint is now disabled by default. Set the UPDATE_TOKEN authentication token to a secret value to enable it.
The usual round of dependency upgrades, among them an upgrade of cryptography from 45.0.5 to 45.0.6

Assets 2

07 Aug 14:30

leonhandreke

v4.5.1

c524bc6

v4.5.1

Some minor dependency upgrades, plus a little convenience manifest file that allows OpenSanctions customers to easily supply the secret delivery token for the delivery.opensanctions.com service.

What's Changed

Bump aiohttp[speedups] from 3.12.13 to 3.12.14 by @dependabot[bot] in #791
Bump orjson from 3.10.18 to 3.11.0 by @dependabot[bot] in #797
Bump fastapi from 0.116.0 to 0.116.1 by @dependabot[bot] in #794
Bump orjson from 3.11.0 to 3.11.1 by @dependabot[bot] in #812
Bump aiohttp[speedups] from 3.12.14 to 3.12.15 by @dependabot[bot] in #813
Bump elasticsearch[async] from 8.17.2 to 8.19.0 by @dependabot[bot] in #825
Introduce a warm-up function for fetching the catalog by @pudo in #816
Add a pre-made delivery.opensanctions.com manifest by @leonhandreke in #826

Full Changelog: v4.5.0...v4.5.1

Contributors

pudo, leonhandreke, and dependabot

Assets 2

09 Jul 13:46

leonhandreke

v4.5.0

832160d

v4.5.0

Note: Triggers full index rebuild

I'm happy to announce the release of Yente 4.5.0 with a few new features and the usual bag of dependency upgrades. This release includes the following new features:

a new auth_token parameter to set on catalogs in the manifest. This token (which can also be set from an environment variable) will be sent in the Authorization header in all requests for the catalog and its dataset files.
- This mechanism replaces the DATA_TOKEN environment variable (which sets the Authentication header). This mechanism is now deprecated and will be removed in a future version of Yente. If you're using this, please migrate to auth_token instead.
The /match endpoint now has a exclude_entity_ids parameter that allows callers to exclude some entities from matching. This may be useful if you're doing periodic screening of the same entities and have decided that a match is a false-positive and want it to be excluded from matching completely. Thanks to @baldurh for the idea!

Further, the release includes the following improvements & changes worth noting:

Entities are now retrieved from the search index with stable ordering during search and match queries. This produces reproducible match results in cases where all match candidates equally ranked in terms of relevance, often because of an overly broad query (like "LLC", "John") matching too many entities. It does not, in itself, improve the quality of candidates that will be retrieved.
The usual round of dependency upgrades, including the cryptography package from 45.0.3 to 45.0.5

Full Commit Log

Bump index and app version by @jbothma in #764
Bump aiohttp[speedups] from 3.12.9 to 3.12.12 by @dependabot in #767
Bump cryptography from 45.0.3 to 45.0.4 by @dependabot in #768
Bump aiohttp[speedups] from 3.12.12 to 3.12.13 by @dependabot in #771
Gracefully handle missing entity_id field while index rebuild adds it by @jbothma in #772
Bump fastapi from 0.115.12 to 0.115.13 by @dependabot in #776
Refactor data/manifest.py to be less side-effecty by @leonhandreke in #778
Update pytest-asyncio requirement from <1.0.0,>=0.25.0 to >=0.25.0,<2.0.0 by @dependabot in #770
Bump pytest from 8.4.0 to 8.4.1 by @dependabot in #774
Allow setting auth token per catalog by @leonhandreke in #777
Bump fastapi from 0.115.13 to 0.115.14 by @dependabot in #781
Add exclude_entity_ids param to /match by @leonhandreke in #783
Bump fastapi from 0.115.14 to 0.116.0 by @dependabot in #789
Bump cryptography from 45.0.4 to 45.0.5 by @dependabot in #787
Bump uvicorn[standard] from 0.34.3 to 0.35.0 by @dependabot in #785
Implement env var expansion for auth_token in manifest by @leonhandreke in #788

Full Changelog: v4.4.0...v4.5.0

Contributors

leonhandreke, jbothma, and 2 other contributors

Assets 2

05 Jun 09:41

jbothma

v4.4.0

c2eb034

v4.4.0

Note: Triggers full index rebuild

The tokenised name indexing has been improved - prefixes from Person names, e.g. Mrs, Mr, etc are stripped. This should reduce the false positive rate for cases where those prefixes contributed to the score.
The environment variable YENTE_MATCH_FUZZY will take effect again. When true (default), the candidate generation stage (Elasticsearch query) of /match queries can include names with small spelling differences from the query. This does not affect the match score as indicated in the response.

What's Changed

adapt to new rigour text processing logic by @pudo in #749
Bump aiohttp[speedups] from 3.11.16 to 3.11.18 by @dependabot in #729
Bump cryptography from 44.0.3 to 45.0.2 by @dependabot in #750
Bump aiohttp[speedups] from 3.11.18 to 3.12.2 by @dependabot in #753
Bump rigour from 0.12.1 to 0.12.2 by @dependabot in #751
Bump cryptography from 45.0.2 to 45.0.3 by @dependabot in #752
Update multidict requirement from <6.3.0 to <6.5.0 by @dependabot in #723
Add option to use fuzzy search on names when generating candidates by @jbothma in #762
Bump pytest from 8.3.5 to 8.4.0 by @dependabot in #760
Bump rigour from 0.12.2 to 0.13.0 by @dependabot in #758
Bump structlog from 25.3.0 to 25.4.0 by @dependabot in #759
Bump uvicorn[standard] from 0.34.2 to 0.34.3 by @dependabot in #757
Bump aiohttp[speedups] from 3.12.2 to 3.12.6 by @dependabot in #756
Bump aiohttp[speedups] from 3.12.6 to 3.12.9 by @dependabot in #763

Full Changelog: v4.3.1...v4.4.0

Contributors

pudo, jbothma, and dependabot

Assets 2

10 Apr 08:14

jbothma

v4.3.1

2aa4e44

v4.3.1

Note: Triggers full index rebuild - Due to index schema changes this update triggers a full index rebuild even if you are up to date and use incremental index updates. The changes are backward-compatible so your yente service should remain available during rebuild, but the typical full index rebuild load will be seen on your ElasticSearch/OpenSearch deployment.

What's Changed

This fixes a couple of bugs in the (beta) adjacent entities API

Adjacent entities not found if the root is an edge
Adjacent entities not found if the property name conflicts with a property of another type (observed with directors' directorships)

#718

Other changes:

Bump nomenklatura from 3.17.1 to 3.17.2 by @dependabot in #715
Bump anyio from 4.8.0 to 4.9.0 by @dependabot in #687
Bump rigour from 0.9.9 to 0.9.10 by @dependabot in #714
Bump jellyfish from 1.1.3 to 1.2.0 by @dependabot in #707
Bump index version to force re-index with mapping fix by @jbothma in #721

Full Changelog: v4.3.0...v4.3.1

Contributors

jbothma and dependabot

Assets 2

04 Apr 10:33

leonhandreke

v4.3.0

2c3cc90

v4.3.0

In addition to the usual routine dependency upgrades and fixes, this release includes two new endpoints to retrieve entities adjacent to a given entity (e.g. holders of an office or assets owned by a person) in a paginated way.

Adjacent entities API

This release includes a beta version of two new adjacent entities API endpoints. These APIs may still be subject to change, a production-ready stable release will be announced in the future. We would love to get your feedback, here on GitHub or on our Discourse or any other way.

The existing /entities/{entity_id}endpoint can nest adjacent entities (using the nested=true query parameter), but does so without limits of how many entities can be included. This can lead to large, slow responses when requesting entities with a large number of adjacent entities, such as institutions with many securities, or PEP positions with many holders. The /entities/{entity_id}/adjacent and /entities/{entity_id}/adjacent/{property_name} endpoints return a limited number of results by default with pagination parameters for retrieving more.

What's Changed

Paginated adjacencent entities API by @jbothma in #686
Bump fastapi from 0.115.11 to 0.115.12 by @dependabot in #697
Restore old behavor of match API where topics are OR'd by @leonhandreke in #704
Bump pyicu from 2.14 to 2.15 by @dependabot in #696
Bump orjson from 3.10.15 to 3.10.16 by @dependabot in #699
Bump aiohttp[speedups] from 3.11.13 to 3.11.15 by @dependabot in #708
Bump followthemoney from 3.8.1 to 3.8.2 by @dependabot in #698
Bump rigour from 0.9.6 to 0.9.7 by @dependabot in #695
Bump aiohttp[speedups] from 3.11.15 to 3.11.16 by @dependabot in #709
Bump nomenklatura from 3.16.3 to 3.17.1 by @dependabot in #701
Add add docs for adjacent entities endpoints and make them visible by @jbothma in #705
Pin leaky multidict until they release a fix by @jbothma in #711

New Contributors

@leonhandreke made their first contribution in #704

Full Changelog: v4.2.4...v4.3.0

Contributors

leonhandreke, jbothma, and dependabot

Assets 2

18 Mar 12:08

pudo

v4.2.3

3a44e74

v4.2.3

Undo a change to the launch mechanism for the ASGI server which caused failures to boot in some environments.

Full Changelog: v4.2.2...v4.2.3

Assets 2

14 Mar 12:20

pudo

v4.2.2

19ee0df

v4.2.2

What's Changed

Improved country/territory support (e.g. jurisdictions like ae-du now map to their main country)
Smaller Docker image by using multi-stage builds (thanks @legal90 !)
Expose an ASGI app that can be run by uvicorn directly (uvicorn yente.asgi:app)
Removed legacy (and confusing) fuzzy flag from /match API
Adopt pyproject.toml for Python library management

New Contributors

@faishal made their first contribution in #618

Full Changelog: v4.2.0...v4.2.2

Contributors

faishal and legal90

Assets 2

Uh oh!

Releases: opensanctions/yente

v5.0.2

What's Changed

Contributors

Uh oh!

v5.0.1

Contributors

Uh oh!

v5.0.0

Uh oh!

v4.5.1

What's Changed

Contributors

Uh oh!

v4.5.0

Full Commit Log

Contributors

Uh oh!

v4.4.0

What's Changed

Contributors

Uh oh!

v4.3.1

What's Changed

Contributors

Uh oh!

v4.3.0

Adjacent entities API

What's Changed

New Contributors

Contributors

Uh oh!

v4.2.3

Uh oh!

v4.2.2

What's Changed

New Contributors

Contributors

Uh oh!