A scalable method to estimate the optimal number of clusters in a dataset.
It is based on the algorithm described in Progeny Clustering: A method to Identify Biological Phenotypes Additionally, much of the code extends the prior R implementation, but with a cleaner, 'sci-kit learn' like interface.
It's still under development, and the numerical results of the algorithm should not be trusted under any circumstances.