UMAP transform yields different embeddings for the same fitted model when transforming a subset vs. transforming all then sub-selecting

When fitting a single UMAP model on the full dataset and then embedding only the female samples, 
I expect:
umap.transform(X_female) to be identical (up to numerical noise) to
umap.transform(X_all)[female_idx].

Instead, I consistently get large differences between the two, despite using the same fitted model, fixed random_state, and effectively single-threaded execution (UMAP prints the warning that random_state disables parallelism).

**Expected behavior**
umap.transform(X_subset) should produce the same embedding as umap.transform(X_all)[subset_idx].

The results statistics:
same fitted model, transform females only vs transform all, then subset:
```
max |Δ| = 2.443e+00
RMSE    = 4.176e-01
```

**Actual behavior**
Significant coordinate shifts appear even with fixed seeds, single-threaded execution, and identical preprocessing.

**Attached are three figures:**
case1_overlay.png — overlay of both embeddings (colored by transform path)
case1_delta_scatter.png — 2D scatter of pointwise Δ
case1_delta_hist.png — histogram of ‖Δ‖ per sample


These visuals clearly show that transform(X_subset) and transform(X_all)[subset_idx] yield different embeddings.
Why does the UMAP transform work like this?

<img width="600" height="400" alt="Image" src="https://github.com/user-attachments/assets/2d47d540-34e7-4845-804e-5032aca8735b" />
<img width="600" height="500" alt="Image" src="https://github.com/user-attachments/assets/c4b3b769-1ead-4e84-a7ca-f72814748890" />
<img width="700" height="600" alt="Image" src="https://github.com/user-attachments/assets/4abe6e24-da2a-4935-979b-f1c74cb7b913" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UMAP transform yields different embeddings for the same fitted model when transforming a subset vs. transforming all then sub-selecting #1224

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

UMAP transform yields different embeddings for the same fitted model when transforming a subset vs. transforming all then sub-selecting #1224

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions