when random_state is set automatically in config, it is not sufficient for reproducibility
Closed this issue · 1 comments
tkonopka commented
A user reported a bug via email.
Embeddings are reproducible when a seed is set manually.
result_1 <- umap(dataset, random_state=123)
result_1$config$random_state == 123 # TRUE, the seed is recorded in config
result_2 <- umap(dataset, random_state=result_1$config$random_state)
identical(result_1, result_2) # TRUE, which is correct
When an embedding is created without a seed, the package creates and sets its own seed. The intention is to be able to recreate the same result if needed, even if the first run did not set a specific seed.
result_3 <- umap(dataset)
result_3$config$random_state > 0 # TRUE, signals intention for reproducibility
result_4 <- umap(dataset, random_state=result_3$config$random_state)
identical(result_3, result_4) # FALSE, but should actually be TRUE
The bug affects version 0.2.9 and probably older versions as well.