This project implements some common statistical data analysis methods, including:
Given two samples, perform t-test to determine if there is a significant difference between the means of these two samples. This test is mostly used when the data sets follow a normal distribution.
Mann-Whitney U test is used to perform two samples hypothesis tests if their distributions are unknown. Formula: t = (mean1 - mean2)/sqrt(sem1^2 - sem2^2) mean1, mean2: means of two samples sem1, sem2: standard error of two samples
KLD is also called relative entropy
Empirical CDF distance of two non-parametric variables.
Formula:
D = sum((CDF(X) - CDF(Y))**2)
Calculate entropy by binning. The number of bins is determined by different rules depending on the variable's distributions
KS test that can be used to compare two non-parametric variables.
Formula:
D = max(abs(CDF(X) - CDF(Y)))
Calculate the mutual information between X and Y. Using single and joint entropies: https://en.wikipedia.org/wiki/Mutual_information#Relation_to_conditional_and_joint_entropy
Formula: I(X, Y) = H(X) + H(Y) - H(X, Y)