data_analysis: A Jupyter Notebook repository from e181337

e181337/data_analysis

In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection with IGR is applied for numerical columns. Chi-square test is applied for categorical columns in order to test whether there exist differences between distributions for target columns. Correlation analysis for an imbalanced data set is applied by using undersampling methods.

Jupyter Notebook

e181337/data_analysis

data_analysis for imbalanced data

Application of Shapiro-Wilk, Anderson Darling, Chi-square tests

Correlation analysis for imbalanced data

Outlier detection