cokelaer/fitter

Lack of Clarity on the Parameters of the Distribution

cschiri opened this issue · 8 comments

If I use the "get_best" sub-module as follows:

f.get_best(method='sumsquare_error')

It returns the best fitted distribution and its parameters; i.e., a dictionary with one key (the distribution name) and its parameters.

For instance:

{'beta': (1.0900359801761663, 0.8058383063379988, -9.543996466545888, 107.5439964665459)}

Could you please provide clarity on which is the mean, standard deviation, etc? The package documentation does not provide clarity.

+1 (same question/issue as @cschiri )

I believe that the list of values is in the exact same order as the one used under the hood by scipy. Not necesseraly obvious to retrieve which is which. Not sure I will implement it soon though. If you are willing to help, I'll be happy to include this feature. This may have side effects when plotting the results. Maybe it would be easier to have a new method to do the work. Sorry for not helping more.

A quick note for those in need: Use beta for instance, the parameters are (a, b, loc, scale). In scipy.stats.distributions the mean (loc) and standard deviation (scale) will always be the last two values. Normal distribution will just have 2 paras (loc, scale). In the above example, we have {loc:-9.543996466545888, scale: 107.5439964665459}.

I have the same question in FUNCTION ERLANG, but I can not make sure what the paras mean...TOT

The Filter package is very useful. Thanks to all contributors. To make it more reachable for students and researchers I wrote this blog [recently added the streamlit app link in the blog]

Medium Blog Link

Yes, in the get_best function this problem exists. Even in Scipy documentation, it is sometimes not clear. I tried to implement a Streamlit app using the Fitter library and faced the same issue. To resolve that I scraped all distribution-related data and made a dictionary where keys are the parameter name and values are the best parameter values.

The problem is that few of the distributions [which are not a part of distribution] produces an error when trying to retrieve the best parameters using the dictionary approach.

# Removing these three from dictionary resolved the issue.
#    "rv_continuous":   ,
#    "rv_histogram":     ,
#    "trapz":                 , 

Even scraping is not consistent as some of the distribution page follows old URL style in Scipy documentation

For example, these four URL follows old webpage of Scipy documentation

# https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.frechet_l.html
# https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.frechet_r.html
# https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.reciprocal.html
# https://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.integrate.trapz.html

The new page style is like this:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html

I hope this issue resolves soon.

If I use the "get_best" sub-module as follows:

f.get_best(method='sumsquare_error')

It returns the best fitted distribution and its parameters; i.e., a dictionary with one key (the distribution name) and its parameters.

For instance:

{'beta': (1.0900359801761663, 0.8058383063379988, -9.543996466545888, 107.5439964665459)}

Could you please provide clarity on which is the mean, standard deviation, etc? The package documentation does not provide clarity.

f=Fitter()
f.fit()
best= f.get_best(method = 'sumsquare_error')
distribution = getattr(st, list(best.keys())[0])
param_names = (distribution.shapes + ', loc, scale').split(', ') if distribution.shapes else ['loc', 'scale']

param_dict= {}
for d_key, d_val in zip (param_names,list(best.values())[0]):
    param_dict[d_key]= d_val

These few lines of code will help to get parameter dictionary @cokelaer I believe you can find a way to implement this portion of code so that user can see the name of parameter also

@cokelaer I am not a good coder. Even this is my first public repository contribution. The pull request is there. There is probably some problem in Linux which is related to a gamma distribution. But it has nothing to do with the function that I have edited. Still, you can check and decide if it is ok to accept the pull request.

@kabirmdasraful thanks again for your contribution. I will release a new version of fitter on pypi (1.4.0) and will update the documentation accordingly.