/Iris-Dataset-Analysis

This project involves performing various statistical tests and visualizations on the Iris dataset. Below is a summary of the tests conducted and their results.

Primary LanguageJupyter Notebook

Iris Dataset Analysis

This project involves performing various statistical tests and visualizations on the Iris dataset. Below is a summary of the tests conducted and their results.

Statistical Tests

Shapiro-Wilk Test:

**sepal_length

Test Statistic: 0.9761

p-value: 0.0102

Explanation: This test assesses the normality of the data distribution. Since the p-value is less than 0.05, it indicates that the data significantly deviates from a normal distribution, rejecting the normality assumption.

D’Agostino’s K^2 Test:

** sepal_length*

Test Statistic: 5.7356

p-value: 0.0568

Explanation: This test evaluates the deviation of the data from a normal distribution. With a p-value near 0.05, it suggests that there may be some deviation from normality, though it is not definitive.

Anderson-Darling Test:

**sepal_length

Test Statistic: 0.8892

Critical Values: 0.562 (15%), 0.64 (10%), 0.767 (5%), 0.895 (2.5%), 1.065 (1%)

Explanation: This test checks the fit of the data to a normal distribution. The sepal_length test statistic is close to the 2.5% significance level critical value, suggesting limited adherence to normal distribution.

ANOVA (Analysis of Variance)

**species - sepal_length

Sum of Squares: species 63.212, Residual 38.956

Degrees of Freedom: species 2, Residual 147

F Statistic: 119.265

p-value: 1.67e-31

Explanation: ANOVA tests whether the means of different groups are significantly different. The very small p-value indicates that the mean differences among species are statistically significant and not due to random chance.

t-Test

**sepal_length - setosa p-value: 8.9852e-18

Explanation: The t-Test assesses the significance of the mean difference between two groups. The very small p-value suggests that the difference in means is significant and not due to random variation.

Visualizations

Histograms: Used to understand the distribution of data for each species.

Box Plots: Visualize the median, variance, and outliers across species.

Heatmap: Show relationships and interactions between different variables and species.

Screenshot 2024-07-30 143950 Screenshot 2024-07-30 143928 Screenshot 2024-07-30 143959

Libraries Used

pandas

numpy

scipy

matplotlib

seaborn