UTAustin_PGP_Projects

Projects:

1. Foundation for AIML - MovieLens Data Exploration

  • Covers Descriptive Statistics, Exploratory Data Analysis covering Visualizations too
    • Project link: MovieLens Data Exploration
      • The GroupLens Research Project is a research group in the Department of Computer Science and Engineering at the University of Minnesota. The data is widely used for collaborative filtering and other filtering solutions. However, we will be using this data to act as a means to demonstrate our skill in using Python to “play” with data.
      • Learning Outcomes:
        • Exploratory Data Analysis
        • Visualization using Python
        • Pandas – groupby, merging

2. Supervised Machine Learning - Thera Bank Personal loan campaign

  • Context: This case is about a bank (Thera Bank) whose management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors). A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns with better target marketing to increase the success ratio with a minimal budget.

  • Goal: The classification goal is to predict the likelihood of a liability customer buying personal loans.

    • Project link: Thera Bank Personal Loan Campaign
      • Identified potential loan customers for Thera Bank using classification techniques. Compared multiple models (Logistic regression, KNN, Naive Bayes). Out of 3, KNN is the best in giving overall performance and greater Recall score

      • As the Class 1 data in the dataset is very low, applied SMOTE on minority class and it gave better Recall compared to just taking 10% of minority class

      • Important Metric: Here more focus towards should be towards recall because our target variable is 'Personal Loan' , i.e whether the customer is accepting the personal loan or not. And the bank wants more people to accept personal loan i.e. less number of False Negative, so that bank doesn't lose real customers who want to take loan. Hence the focus should be on increasing Recall.

        Model Comparision KNN ROC

      • Learning Outcomes:

        • Exploratory Data Analysis
        • Preparing the data to train models
        • Training and making predictions using classification models (LR, KNN, NB)
        • Model evaluation (Confusion Matrix, ROC-AUC, Classification report, Classification Metrics)
        • Class Imbalance Handling (using SMOTE)

3. Ensemble Techniques - Term Deposit Subscription

  • Goal: To build a model that will help the marketingteam identify potential customers who are relatively more likely to subscribe to termdeposits and thus increase their hit ratio.

4. Feature selection, model selection and Tuning - Concrete Strength Prediction

  • Goal: To predict the concrete strength. Apply feature engineering and model tuning to obtain a score above 85%
    • Project link: Concrete Strength Prediction

      • The concrete compressive strength is the regression problem and concrete compressive strength of concrete is a highly nonlinear function of age and ingredients

      • Applied EDA and feature engineering and transforming features

      • Tried multiple ML algs (OLS, LR, Lasso, Ridge, Polynomial, KNN, SVR, DT Regressor, RF Reg, XGBoost, Gradient Boost Reg)

      • Used KFold Cross validation to evaluate model performance and Model tuning using Hyper params

      • Performance metrics used to select Best model is R2, RMSE, MAE

        • Best Model is Gradient Boosting with 90% R2, Test MAE 3.24 and Test RMSE of 4.89
        • Computed 95% confidence interval for test RMSE
      • Strength Vs other Features Strength Vs other Features

      • Models comparision

      Models comparision

      Pred vs Actual

5. Unsupervised Machine Learning - AllLife Credit Card Customer Segmentation

5. Computer Vision

  • Covers Introduction to Convolutional Neural Networks, Convolution, Pooling, Padding & its mechanisms, Forward propagation & Backpropagation for CNNs

    • Project link: Plant Seedlings Image Classification using CNNs in Keras

      • Recognize, identify and classify plant images using CNN and image recognition algorithms. The goal of the projectis to create a classifier capable of determining a plant's species from a photo.
    • Learning Outcomes:

      • Pre-processing of image data.
      • Visualization of images
      • Building CNN and Evaluate the Model
      • The motive of the project is to make the learners capable to handle images/image classification problems, during this process you shouldalso be capable to handle real image files, not just limited to a numpy array of image pixels
    • Details:

      • Applied Image preprocessing techniques (Resize, Gaussian Blurr, Masking, grey scale and Laplacian Edge detection)
      • CNN with Batch Normalization, Maxpooling, dropouts + Dense layers is a good combination for image classification
      • CNN Model Architecture
        • Convolutional input layer, 32 feature maps with a size of 3X3 and a * rectifier activation function
        • Batch Normalization
        • Max Pool layer with size 2×2 and a stride of 2
        • Convolutional layer, 64 feature maps with a size of 3X3 and a rectifier activation function.
        • Batch Normalization
        • Max Pool layer with size 2×2 and a stride of 2
        • Convolutional layer, 64 feature maps with a size of 3X3 and a rectifier activation function.
        • Batch Normalization
        • Max Pool layer with size 2×2 and a stride of 2
        • Flatten layer
        • Fully connected or Dense layers (with 512 and 128 neurons) with Relu Act.
        • Dropout layer to reduce overfitting or for regularization
        • O/p layer with Softwax fun. to detect multiple categories
    • Different Plant types Plant_image

    • Resize, Gaussian Blurr and Masking Preprocessing

    • Gray scale gray

    • Laplacian edge edge

    • Classification report

    report

    CM

    sample