tkonopka/umap

when random_state is set automatically in config, it is not sufficient for reproducibility

Closed this issue · 1 comments

A user reported a bug via email.

Embeddings are reproducible when a seed is set manually.

result_1 <- umap(dataset, random_state=123)
result_1$config$random_state == 123 # TRUE, the seed is recorded in config

result_2 <- umap(dataset, random_state=result_1$config$random_state)
identical(result_1, result_2) # TRUE, which is correct

When an embedding is created without a seed, the package creates and sets its own seed. The intention is to be able to recreate the same result if needed, even if the first run did not set a specific seed.

result_3 <- umap(dataset)
result_3$config$random_state > 0 # TRUE, signals intention for reproducibility

result_4 <- umap(dataset, random_state=result_3$config$random_state)
identical(result_3, result_4) # FALSE, but should actually be TRUE

The bug affects version 0.2.9 and probably older versions as well.

fix now available in v0.2.10.0