Skip to content

Releases: opensanctions/yente

v5.0.2

06 Oct 12:41

Choose a tag to compare

Brings in updates to symbolic matching in nomenklatura and rigour, and tries to make single-word queries less horrible by forcing fuzzy matching on them. Clarify and raise for invalid catalog specifications.

What's Changed

  • build(deps): update click requirement from ~=8.2.0 to >=8.2,<8.4 by @dependabot[bot] in #895
  • build(deps): bump fastapi from 0.116.2 to 0.117.1 by @dependabot[bot] in #897
  • build(deps): bump anyio from 4.10.0 to 4.11.0 by @dependabot[bot] in #899
  • build(deps): bump aiocsv from 1.3.2 to 1.4.0 by @dependabot[bot] in #898
  • build(deps): bump uvicorn[standard] from 0.35.0 to 0.37.0 by @dependabot[bot] in #900
  • build(deps): bump fastapi from 0.117.1 to 0.118.0 by @dependabot[bot] in #901
  • build(deps): bump cryptography from 46.0.1 to 46.0.2 by @dependabot[bot] in #903
  • Log HTTP error URL explicitly by @jbothma in #906
  • Enable fuzzy search for single-word queries by @pudo in #907
  • Clarify multi catalog/dataset manifest configuration by @jbothma in #904

Full Changelog: v5.0.1...v5.0.2

v5.0.1

23 Sep 14:29

Choose a tag to compare

Just a minor patch release of yente, pulling in a new nomenklatura with some fixes to the new logic-v2 scoring algorithm. Thank you to @baldurh for the in-depth feedback and bug hunting! More feedback on this new scoring system (and all other aspects of yente is always appreciated - please get in touch here on GitHub or our Forum.

Note that this release also bumps cryptography from 45.0.7 to 46.0.1.

Full Changelog: v5.0.0...v5.0.1

v5.0.0

15 Sep 15:00

Choose a tag to compare

  • New logic-v2 matcher: We're including a new matching system, logic-v2. This new system reflects feedback from users of logic-v1 and introduces a more precise, explainable, and culturally-aware way to match names of people and companies. logic-v2 reduces false positives, is fully deterministic, improves cross-language and cross-script matching, runs fast, and provides detailed explanations of its decision-making path. Please be aware that we'll still be adjusting logic-v2 based on user feedback, so don't expect every scoring decision to be set in stone yet. We look forward to your feedback, be it as an issue report on our GitHub, or on our discussion forum or via the support team.
  • Field deprecations and renames:
    • The matcher field in /match responses will be removed in a future version of yente. Equivalent information is available via the /algorithms endpoint.
    • The /match endpoint response has gained a new explanations field that is a strict superset of features. In addition to the partial scores produced by each component of the matching system, it includes textual descriptions of the matching decisions from each of these subsystems. These descriptions are designed for display in analyst workbenches or can be passed to generative AI tools to help interpret screening alerts. The features field will be removed in a future version of yente.
    • The cutoff parameter on the /match endpoint has been deprecated and will be removed in a future version of yente. If you care about low-scoring results being returned, please set the threshold parameter instead.
    • The /algorithms endpoint has gained a new docs field that is a strict superset of features. In addition to a documentation of the algorithm features (yes, those featured in the explanations field in /match), it contains a list of configuration knobs to tune the algorithm to your individual needs. Currently, only logic-v2 offers these knobs.
    • These algorithm configuration options are passed in a new config field on the /match request.
  • Locking mechanism to prevent concurrent reindexes: One of the most common issues with yente, both in our hosted deployment an for external users, was two re-index jobs stepping on each other's toes or overloading the Elastic backend, doing the same thing twice. Before we start a reindex, we now write a little lock to the Elastic backend so that other instances of yente who might be planning to do a reindex know that one is already in progress. Regardless, if you're running multiple instances of yente, you should still configure a separate re-indexing cronjob. See documentation on how to deploy yente for more information.
  • Audit log of index operations: A log of re-indexing and index cleanup operations is written to a special index in Elastic. This allows users to get an accurate record of what data became available when in their yente without digging through log messages. To read the audit log, simply run yente audit-log (or use another CSV viewer of your choice).
  • Google Cloud Logging compatible request logging: If you're running yente on Google Cloud Run, handy little badges with information about the HTTP request as well as a little button to filter logs from only this trace will show up in the Logs Explorer. Careful: the format of the log message logged for each request changed, so if you're parsing the JSON logged by yente, please update your infrastructure accordingly.
  • FollowTheMoney 4.0: This release pulls in an updated version of the data model underpinning yente. If you're ingesting custom data sources that you're generating yourself, please check out the release notes. The only change that may be notable for users of yente is probably the rename of CryptoWallet:managingExchange. Users relying on recent published OpenSanctions data can ignore this change.
  • Documentation moved to yente.followthemoney.tech: yente is part of the wider open source ecosystem around the FollowTheMoney data model, and the move of its documentation reflects this. The documentation is now part of the source repository and can be edited by anyone. Documentation is never perfect, and we welcome your PRs! The OpenSanctions website remains a good place to read about the data underpinning yente and our hosted API.
  • Celebrity-friendly scoring in /search: While /match is at the heart of what yente does best, the /search endpoint is what users usually hit first when they type in a search query on opensanctions.org. By deploying cutting-edge research from our search engine labs, we ensure the Putin ranks first when you type in "putin". Much wow!
  • The /updatez endpoint is now disabled by default. Set the UPDATE_TOKEN authentication token to a secret value to enable it.
  • The usual round of dependency upgrades, among them an upgrade of cryptography from 45.0.5 to 45.0.6

v4.5.1

07 Aug 14:30

Choose a tag to compare

Some minor dependency upgrades, plus a little convenience manifest file that allows OpenSanctions customers to easily supply the secret delivery token for the delivery.opensanctions.com service.

What's Changed

Full Changelog: v4.5.0...v4.5.1

v4.5.0

09 Jul 13:46

Choose a tag to compare

Note: Triggers full index rebuild

I'm happy to announce the release of Yente 4.5.0 with a few new features and the usual bag of dependency upgrades. This release includes the following new features:

  • a new auth_token parameter to set on catalogs in the manifest. This token (which can also be set from an environment variable) will be sent in the Authorization header in all requests for the catalog and its dataset files.
    • This mechanism replaces the DATA_TOKEN environment variable (which sets the Authentication header). This mechanism is now deprecated and will be removed in a future version of Yente. If you're using this, please migrate to auth_token instead.
  • The /match endpoint now has a exclude_entity_ids parameter that allows callers to exclude some entities from matching. This may be useful if you're doing periodic screening of the same entities and have decided that a match is a false-positive and want it to be excluded from matching completely. Thanks to @baldurh for the idea!

Further, the release includes the following improvements & changes worth noting:

  • Entities are now retrieved from the search index with stable ordering during search and match queries. This produces reproducible match results in cases where all match candidates equally ranked in terms of relevance, often because of an overly broad query (like "LLC", "John") matching too many entities. It does not, in itself, improve the quality of candidates that will be retrieved.
  • The usual round of dependency upgrades, including the cryptography package from 45.0.3 to 45.0.5

Full Commit Log

Full Changelog: v4.4.0...v4.5.0

v4.4.0

05 Jun 09:41
c2eb034

Choose a tag to compare

Note: Triggers full index rebuild

  • The tokenised name indexing has been improved - prefixes from Person names, e.g. Mrs, Mr, etc are stripped. This should reduce the false positive rate for cases where those prefixes contributed to the score.
  • The environment variable YENTE_MATCH_FUZZY will take effect again. When true (default), the candidate generation stage (Elasticsearch query) of /match queries can include names with small spelling differences from the query. This does not affect the match score as indicated in the response.

What's Changed

Full Changelog: v4.3.1...v4.4.0

v4.3.1

10 Apr 08:14
2aa4e44

Choose a tag to compare

Note: Triggers full index rebuild - Due to index schema changes this update triggers a full index rebuild even if you are up to date and use incremental index updates. The changes are backward-compatible so your yente service should remain available during rebuild, but the typical full index rebuild load will be seen on your ElasticSearch/OpenSearch deployment.

What's Changed

This fixes a couple of bugs in the (beta) adjacent entities API

  • Adjacent entities not found if the root is an edge
  • Adjacent entities not found if the property name conflicts with a property of another type (observed with directors' directorships)

#718

Other changes:

Full Changelog: v4.3.0...v4.3.1

v4.3.0

04 Apr 10:33

Choose a tag to compare

In addition to the usual routine dependency upgrades and fixes, this release includes two new endpoints to retrieve entities adjacent to a given entity (e.g. holders of an office or assets owned by a person) in a paginated way.

Adjacent entities API

This release includes a beta version of two new adjacent entities API endpoints. These APIs may still be subject to change, a production-ready stable release will be announced in the future. We would love to get your feedback, here on GitHub or on our Discourse or any other way.

The existing /entities/{entity_id}endpoint can nest adjacent entities (using the nested=true query parameter), but does so without limits of how many entities can be included. This can lead to large, slow responses when requesting entities with a large number of adjacent entities, such as institutions with many securities, or PEP positions with many holders. The /entities/{entity_id}/adjacent and /entities/{entity_id}/adjacent/{property_name} endpoints return a limited number of results by default with pagination parameters for retrieving more.

What's Changed

New Contributors

Full Changelog: v4.2.4...v4.3.0

v4.2.3

18 Mar 12:08

Choose a tag to compare

Undo a change to the launch mechanism for the ASGI server which caused failures to boot in some environments.

Full Changelog: v4.2.2...v4.2.3

v4.2.2

14 Mar 12:20

Choose a tag to compare

What's Changed

  • Improved country/territory support (e.g. jurisdictions like ae-du now map to their main country)
  • Smaller Docker image by using multi-stage builds (thanks @legal90 !)
  • Expose an ASGI app that can be run by uvicorn directly (uvicorn yente.asgi:app)
  • Removed legacy (and confusing) fuzzy flag from /match API
  • Adopt pyproject.toml for Python library management

New Contributors

Full Changelog: v4.2.0...v4.2.2