Skip to content

distances are buggy, should be re-exports from pynndescent #1233

@ilia-kats

Description

@ilia-kats

The umap package seems to reimplement the same distance metrices as the pynndescent package. However, the umap implementation is buggy in at least one case:

  • sparse.arr_union can return one of its input arrays
  • the return value of sparse.arr_union is used as a writable buffer by sparse.sparse_sum
  • sparse.sparse_diff is a thin wrapper around sparse.sparse_sum
  • sparse.sparse_diff is used in sparse_euclidean

This leads to two issues:

  1. The input array may be modified, which is not expected by the user
  2. If sparse_euclidean, sparse_diff, or sparse_sum is called by a custom distance metric, which gets its sparse array via keyword arguments, the keword arguments will be stored in a closure, which leads numba to treat them as readonly, leading to type inference failure (see Cannot call neighbors: Failed in nopython mode pipeline (step: nopython frontend) scverse/muon#173).

The pynndescent implementation of sparse_euclidean works without any issues. Given that both packages are maintained by you, umap depends on and uses pynndescent, and the supported distance functions appear to be identical, I think it would make sense if umap would simply re-export the distance functions from pynndescent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions