Covered fundamental case studies from Statistical analysis, EDA, feature engineering to machine learning and deep learning models..
Statistical analysis using graphs and distributions
Dataset: Insurance (1000 x 7)
Data preprocessing
missing value handling
EDA
5-point summary of numerical attributes
Univariate analysis
distplot for distribution for numerical attributes
stripplot for distribution for categorical attributes
Bivariate analysis
pairplot for bi-variate analysis
Classify using supervised learning models on training dataset and evaluate performances on test dataset
Dataset: users (48k x 15)
Data preprocessing
concatenate 2 datasets
missing value handling
EDA
Univariate analysis
distplot
value counts for categorical attributes
Bivariate analysis
Pairplot
Feature engineering
Drop attributes with most values 0, and so not useful in model training
Model training
Prepare training set and test set
Train models
Logistic regression
Naive Bayes
KNN classifier
SVC
linear kernel
rbf kernel
poly kernel
sigmoid kernel
Performance evaluation on test data
Model score
confusion matrix
Accuracy
76% (Naive Bayes) to 83% (SVC)
Precision
class 0
class 1
Recall
class 0
class 1
Apply ensemble methods on classification models and evaluate performances
Dataset: bank users (45k x 17)
EDA
5-point summary of numerical attributes
Univariate analysis
distplot
boxplot
Bivariate analysis
pairplot
Data preprocessing
outlier handling
Encoding categorical columns
Model training
Standard classification models
Decision tree classifier
Regularized decision tree classifier
Naive Bayes
Ensemble methods
Bagging classifier on regularized decision tree model
Random forest classifier
Bagging classifier on Naive bayes model
Adaboost classifier on Naive bayes model
Performance evaluation on test data
Confusion matrix
Accuracy score
62% to 87%
Precision
class 0
class 1
Recall
class 0
class 1
Perform classification on an unsupervised dataset and evaluate performance
Dataset: vehicles (~1000 x 19)
Data preprocessing
missing value handling
EDA
pairplot
boxplot
Data preprocessing
z-score on independent attributes to scale the values
Feature engineering
PCA
features reduced to 6
Model Training
SVC model without PCA
SVC model with PCA
Performance evaluation
Accuracy score
92% without PCA
78% with PCA