MatthewReid854/reliability

Make probability plots accessible from distribution objects?

Closed this issue · 3 comments

It would be convenient to be able to generate probability plots from a fitted distribution after the fitting has been done once. My specific use case is after using Fit_Everything. Here's some code that kind of works:

from reliability import Probability_plotting
from reliability.Fitters import Fit_Everything
fe = Fit_Everything(x)
best = fe.best_distribution
plotter = getattr(Probability_plotting, f'{best.name}_probability_plot')
plotter(failures=x)

Not only does that redo the fitting, but it also doesn't work if Fit_Everything winds up choosing a Weibull_3P though because we have no way to know to pass fit_gamma=True to the plotting function without using name2 and a bunch of ifs.

Here's what a possible implementation could look like:

from reliability.Fitters import Fit_Everything
fe = Fit_Everything(x)
best = fe.best_distribution
best.probplot(failures=x)

This is certainly possible, though I'm not sure if it is really and "issue" or just a matter of convenience and computational efficiency (not fitting things twice and not writing lots of if statements).

Your proposed implementation implies that every distribution object would need a probplot method. There is no reason this can't be done but I don't understand why you can't just refit the best distribution individually to obtain the probability plot. The other problem with giving every distribution a probplot method is that you need to supply the failure and right censored data if you want more than just the straight line of the CDF. The way the functions in Probability_plotting are built, it is essential to provide the failure (and right_censored) data even if you give it the __fitted_dist_params (an internal method that I use to make a probability plot skip the fitting step and just take what it is given).

If I was doing this myself, I would use Fit_Everything to tell me the best fit, then I would do the fit again (either with the right function from Fitters or with the right function from Probability_plotting) to obtain the probability plot. As you say, it is necessary to know the distribution you are fitting in order to achieve this so you've either got to do it manually or write in a lot of if statements. I can understand that it may be problematic for someone trying to automate something so it just spits out the correct probability plot every time without user input.
Can you tell me your use case and why you believe it is necessary to include a probplot method inside each distribution object rather than just obtain the probplot separately?

I haven't seen the use of getattr before, but using what you showed me and my knowledge of the hidden variables inside Fit_Everything, I can provide you with this somewhat hacky solution:

from reliability.Distributions import Weibull_Distribution
from reliability import Probability_plotting
from reliability.Fitters import Fit_Everything
import matplotlib.pyplot as plt

data = Weibull_Distribution(alpha=500, beta=2).random_samples(500)  # make some data

results = Fit_Everything(failures=data, show_histogram_plot=False, show_probability_plot=False, show_PP_plot=False)
plotter = getattr(Probability_plotting, f'{results.best_distribution.name}_probability_plot')  # this will obtain the correct plotter (e.g. Weibull_probability_plot) as a class which we can use
params = getattr(results, '_Fit_Everything__'f'{results.best_distribution.name2}_params')  # this will return the parameters object from within Fit_Everything. This is a hidden variable with a name of the form _Fit_Everything__Weibull_2P_params
plotter(failures=data, __fitted_dist_params=params)  # this uses the probability plot class extracted earlier and gives it the fitted distribution's parameters which prevents fitting from being done a second time. Note that we still need to provide the failures as all probability plots need these to generate the scatter plot.
plt.show()

Yes, this is completely about convenience and user-facing simplicity, not something that needs to be fixed. The computational efficiency isn't really a big deal - I'm not fitting things millions of times or anything.

My use case is an automated process that uses Fit_Everything and then uses the resulting distribution for other work. Along the way the probability plots of all the distributions fitted are saved for future reference, but it would be nice to save the one for the selected distribution separately.

I agree that using getattr() to call things by concatenating strings feels hacky, but the code you included does accomplish what I need.

I have decided to incorporate your suggestion. From v0.5.6 onward, Fitters.Fit_Everything includes the input show_best_distribution_probability_plot. This will default to True so there will be 4 figures that are returned from Fit_Everything.

I am currently working on rewriting the Accelerated life testing (ALT) section and I will be introducing an ALT_Fit_Everything function that will fit all the ALT models. I think your idea to also return the probability plot of the best fitting distribution is worth including there as well.
Thanks for your suggestion.