Use hypothesis tests for testing distributions instead of matching moments of the distribution
envp opened this issue · 7 comments
Pearson's chi squared test is a more reliable method of ascertaining whether a sequence of numbers belongs to a distribution or follows a patterns. It is easy to fool the test for correct mean and variance with dummy values inserted to adjust it to fit any distribution.
However we should not ignore that mean and variance must be reproduced correctly, the suggestion here is that Pearson's chi squared test be used to refactor test cases into the following structure:
- should pass chi squared test for a specific distribution, maybe call a test helper like (E.g. for testing the uniform distribution):
pearson_chi_squared(candidate: Distribution::Uniform.rng(0.1, 1), target: :uniform, samples: 1000)
and returns the significance level of the test as a double.
- should return correct metadata and moments of the distribution, say a function to simulate the distribution for a specified confidence or sample size
metadata_for(candidate: Distribution::Normal.rng(0.1, 1), target: :normal, confidence: 0.99, samples: 100)
returns{mean: 0.1, variance: 0.96, skewness: 0.15 ... }
- Alternatively the returnee can just be an array where the entry
i
is momenti
of the sequence
Let me know what you think about this. Right now I feel a lot of test cases are repeated. This issue would of-course require that all the methods in README.md are already implemented so as to compare stuff.
References for statistical tests of significance:
- http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
- http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm
Existing libraries that already perform A/B testing:
- https://github.com/bmuller/abanalyzer (uses statistics2 gem)
I have something basic created here. @agarie @MohawkJohn @clbustos can you please have a look at this and let me know if we can try something similar? (depending on how well this is able to predict things)
This would be really, really good, but also a lot of work. If you do have the time to work on implementing this, please, go for it!
I looked into your gist, and it seems OK—we'll probably have to make some minor adjustments to style, but that shouldn't be a problem after the problem is solved.
Thanks. I'm currently finalizing the binomial rng since the first principles one is too slow to be practical beyond a small sample size. I'll start on this once I finish binomial.
Would this be better implemented as a separate module under lib (maybe other tests can be added here later), or just added to spec_helper.rb?
Add it to spec_helper.rb
, as it is still small enough and I don't think we have a lot of certainty on the "best" way to structure it yet. Put an example in the documentation as well.
Just a random thought: This should be included on statsample later, because is a statistical test after all. Kolmogorov-Smirnof and homogeneity chi-square test are already there.
Thanks for pointing me to that, I found that there is already an implementation already in place in statsample/test/chisquare.rb
I think we can replace the mean tests if the goodness of fit tests gives better performance for a similar sample size.
How we can measure performance for making the replacement call is:
- Precision & Recall
- Benchmarking existing vs this spec
I see some problems for using statsample to test Distribution::ChiSquare
since that is what statsample uses. A good way around this seems to use a Bayesian test or a binary likelihood ratio test
Let me know what you guys think.
Edit: Updated title to reflect the idea of using statistical hypothesis tests and not just the chiSquared test