This project involves performing various statistical tests and visualizations on the Iris dataset. Below is a summary of the tests conducted and their results.
**sepal_length
Test Statistic: 0.9761
p-value: 0.0102
Explanation: This test assesses the normality of the data distribution. Since the p-value is less than 0.05, it indicates that the data significantly deviates from a normal distribution, rejecting the normality assumption.
** sepal_length*
Test Statistic: 5.7356
p-value: 0.0568
Explanation: This test evaluates the deviation of the data from a normal distribution. With a p-value near 0.05, it suggests that there may be some deviation from normality, though it is not definitive.
**sepal_length
Test Statistic: 0.8892
Critical Values: 0.562 (15%), 0.64 (10%), 0.767 (5%), 0.895 (2.5%), 1.065 (1%)
Explanation: This test checks the fit of the data to a normal distribution. The sepal_length test statistic is close to the 2.5% significance level critical value, suggesting limited adherence to normal distribution.
**species - sepal_length
Sum of Squares: species 63.212, Residual 38.956
Degrees of Freedom: species 2, Residual 147
F Statistic: 119.265
p-value: 1.67e-31
Explanation: ANOVA tests whether the means of different groups are significantly different. The very small p-value indicates that the mean differences among species are statistically significant and not due to random chance.
**sepal_length - setosa p-value: 8.9852e-18
Explanation: The t-Test assesses the significance of the mean difference between two groups. The very small p-value suggests that the difference in means is significant and not due to random variation.
Histograms: Used to understand the distribution of data for each species.
Box Plots: Visualize the median, variance, and outliers across species.
Heatmap: Show relationships and interactions between different variables and species.
pandas
numpy
scipy
matplotlib
seaborn