1. Foundation for AIML - MovieLens Data Exploration
- Covers Descriptive Statistics, Exploratory Data Analysis covering Visualizations too
- Project link: MovieLens Data Exploration
- The GroupLens Research Project is a research group in the Department of Computer Science and Engineering at the University of Minnesota. The data is widely used for collaborative filtering and other filtering solutions. However, we will be using this data to act as a means to demonstrate our skill in using Python to “play” with data.
- Learning Outcomes:
- Exploratory Data Analysis
- Visualization using Python
- Pandas – groupby, merging
- Project link: MovieLens Data Exploration
2. Supervised Machine Learning - Thera Bank Personal loan campaign
-
Context: This case is about a bank (Thera Bank) whose management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors). A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns with better target marketing to increase the success ratio with a minimal budget.
-
Goal: The classification goal is to predict the likelihood of a liability customer buying personal loans.
- Project link: Thera Bank Personal Loan Campaign
-
Identified potential loan customers for Thera Bank using classification techniques. Compared multiple models (Logistic regression, KNN, Naive Bayes). Out of 3, KNN is the best in giving overall performance and greater Recall score
-
As the Class 1 data in the dataset is very low, applied SMOTE on minority class and it gave better Recall compared to just taking 10% of minority class
-
Important Metric: Here more focus towards should be towards recall because our target variable is 'Personal Loan' , i.e whether the customer is accepting the personal loan or not. And the bank wants more people to accept personal loan i.e. less number of False Negative, so that bank doesn't lose real customers who want to take loan. Hence the focus should be on increasing Recall.
-
Learning Outcomes:
- Exploratory Data Analysis
- Preparing the data to train models
- Training and making predictions using classification models (LR, KNN, NB)
- Model evaluation (Confusion Matrix, ROC-AUC, Classification report, Classification Metrics)
- Class Imbalance Handling (using SMOTE)
-
- Project link: Thera Bank Personal Loan Campaign
3. Ensemble Techniques - Term Deposit Subscription
- Goal: To build a model that will help the marketingteam identify potential customers who are relatively more likely to subscribe to termdeposits and thus increase their hit ratio.
-
Project link: Term Deposit Subscription
-
Covers LR, Decision Trees, Bagging, Random Forests, Boosting Algorithms.
-
Because of class imbalance, applied SMOTE
-
Classification report and confusion matrix analysis
-
Feature Selection / Elemination - Used Recursive Feature Elemination and checked model performances for various models
- Best Model is Gradient Boosting with 89% f1, 89% precision and 90% recall
-
-
4. Feature selection, model selection and Tuning - Concrete Strength Prediction
- Goal: To predict the concrete strength. Apply feature engineering and model tuning to obtain a score above 85%
-
Project link: Concrete Strength Prediction
-
The concrete compressive strength is the regression problem and concrete compressive strength of concrete is a highly nonlinear function of age and ingredients
-
Applied EDA and feature engineering and transforming features
-
Tried multiple ML algs (OLS, LR, Lasso, Ridge, Polynomial, KNN, SVR, DT Regressor, RF Reg, XGBoost, Gradient Boost Reg)
-
Used KFold Cross validation to evaluate model performance and Model tuning using Hyper params
-
Performance metrics used to select Best model is R2, RMSE, MAE
- Best Model is Gradient Boosting with 90% R2, Test MAE 3.24 and Test RMSE of 4.89
- Computed 95% confidence interval for test RMSE
-
-
5. Unsupervised Machine Learning - AllLife Credit Card Customer Segmentation
- Covers K-means clustering, Hierarchical clustering techniques with different linkages and PCA
-
Project link: AllLife Credit Card Customer Segmentation
-
Objective: To identify different segments in the existing customer based on their spending patterns as well as past interaction with the bank
-
Key Questions to be answered?
- How many different segments of customers are there?
- How are these segments different from each other?
- What are your recommendations to the bank on how to better market to and service these customers?
-
Approach:
- Identified different customer segments by applying KMeans, different Hierarchical clustering techniques
- Choosing value of K (no. of clusters) using elbow, silhouette diagram, silhouette scores
- Hierarchical clustering using SciPy Linkage and calculating cophenetic correlation to see how better are clusters
- Interesting observations are made during EDA and it is proved after doing clustering
- Used PCA in order to reduce dimensionality
-
-
5. Computer Vision
-
Covers Introduction to Convolutional Neural Networks, Convolution, Pooling, Padding & its mechanisms, Forward propagation & Backpropagation for CNNs
-
Project link: Plant Seedlings Image Classification using CNNs in Keras
- Recognize, identify and classify plant images using CNN and image recognition algorithms. The goal of the projectis to create a classifier capable of determining a plant's species from a photo.
-
Learning Outcomes:
- Pre-processing of image data.
- Visualization of images
- Building CNN and Evaluate the Model
- The motive of the project is to make the learners capable to handle images/image classification problems, during this process you shouldalso be capable to handle real image files, not just limited to a numpy array of image pixels
-
Details:
- Applied Image preprocessing techniques (Resize, Gaussian Blurr, Masking, grey scale and Laplacian Edge detection)
- CNN with Batch Normalization, Maxpooling, dropouts + Dense layers is a good combination for image classification
- CNN Model Architecture
- Convolutional input layer, 32 feature maps with a size of 3X3 and a * rectifier activation function
- Batch Normalization
- Max Pool layer with size 2×2 and a stride of 2
- Convolutional layer, 64 feature maps with a size of 3X3 and a rectifier activation function.
- Batch Normalization
- Max Pool layer with size 2×2 and a stride of 2
- Convolutional layer, 64 feature maps with a size of 3X3 and a rectifier activation function.
- Batch Normalization
- Max Pool layer with size 2×2 and a stride of 2
- Flatten layer
- Fully connected or Dense layers (with 512 and 128 neurons) with Relu Act.
- Dropout layer to reduce overfitting or for regularization
- O/p layer with Softwax fun. to detect multiple categories
-