cokelaer/fitter

Reproducibility across fittings

Opened this issue · 3 comments

Hello!

Thank you so much for making this tool, it is very useful!

I noticed that across multiple fittings to the same set of data, different "best" distributions are shown. Is this intended behaviour? Is there a way to ensure reproducibility across runs, like setting a random seed?

Cheers,
Nancy

thanks for using fitter. Interesting behaviour that you have noted here. This is not intended.
It may happen that several distributions have exactly the same score ? In which case, the sorting algorithm based on the score may not be deterministic, although I doubt it since this is performed with pandas. Would you have an example ?

@cokelaer thanks a lot for such a great tool!
@nchelaru, at first glance I thought I was facing the same issue. But a closer look revealed that in fact the common 'best distributions' between different fittings had the same error values. Looking deeper, I saw a few of my best fit distributions had failed to converge within the default 30s timeout. This may be due to some differences in the internal processing of the function.
The apparent issue was solved when I increased the timeout. Hope this helps. Cheers!

@vartak16 thanks this is probably the reason indeed. Thanks for using fitter and the encouragements.