/sriskid.github.io

Primary LanguageJupyter Notebook

Welcome

Technical Skills: Python, SQL, R

Education

  • M.S. Computational Biology | Carnegie Mellon University (May 2023)
  • B.S. Biology | The University of Texas at Dallas (May 2020)

Projects

Tags: Data Science, Exploratory Data Analysis, Data Exploration, Feature Engineering, Preprocessing, Scikit-Learn

  • Created models to detect fraudulent credit card transactions based on 28 features as well as transaction amaounts
  • Performed Exploratory Data Analysis (EDA) on the dataset to investigate distribution of features by transaction types and feature correlations
  • Engineered features by scaling feature values, taking into account outliers
  • Trained Logistic Regression, Decision Tree, Random Forest, and XGBoost Classifiers and compared performances and speeds
  • Investigated feature reductions based on the top 5 important features from the Random Forest and PCA and compared performances

Tags: Data Science, Exploratory Data Analysis, Time Series Analysis, Correlation Analysis, Lag Analysis, Seasonal Analysis, Scikit-Learn

  • Performed Time Series Analysis on Stock Price Data to Predict Stock Closing Prices
  • Performed Lag Analysis through autocorrelation plots
  • Engineered features to include lag data up to five days
  • Trained Random Forest and XGBoost Models to predict on the test data
  • Extrapolated future stock closing prices

Tags: Deep Learning, Reinforcement Learning, PyTorch, Deep-Q Learning

  • Created a custom environment to simulate the backjack card game that would return the state and reward values
  • Created a deep learning model that would take the game state as input and output an action ("Hit", "Stand")
  • Experimented with exploration vs. exploitation to allow the model to maximize rewards while also finding new strategies through random actions
  • Plotted the results in real-time to show the models score over the course of thousands of games

Tags: Deep Learning, Computer Vision, PyTorch, Multiclass Image Classification, Data Visualization, Transfer Learning

  • Utilized EfficientNetB7 to classify damage severity of car damages (minor, moderate, severe)
  • Fine tuned the pretrained model to suit the image classification
  • Utilized Data Augmentation to increase the number of images and address overfitting
  • Achieved 0.75 validation accuracy

Tags: Deep Learning, Computer Vision, Tensorflow, Multiclass Image Classification, OpenCV, Data Visialization, Transfer Learning

  • Utilized VGG16 pretrained model and fine tuned it to classify different objects in drone images (buildings, forest, glacier, mountain, sea, street)
  • Utilized an Image Data Generator to conserve memory and perform data augmentation to prevent overfitting
  • Utilized Early Stopping and a Learning Rate Scheduler to prevent overfitting
  • Achieved 0.93 accuracy on validation dataset
  • Evaluated the model by plotting the loss, accuracy, and confusion matrix
  • Tested the model on sample images

Tags: Deep Learning, Computer Vision, Data Visualization, OpenCV, PyTorch, Multiclass Image Classification, Transfer Learning

  • Utilized VGG16 pretrained model and fine tuned it to classify different types of car damage from images (crack, scratch, flat tire, dent, shattered glass, broken lamp)
  • Utilized Early Stopping and a Learning Rate Scheduler to prevent overfitting
  • Achieved 0.95 accuracy on validation dataset
  • Evaluated the model by plotting the loss, accuracy, and confusion matrix
  • Made predictions on sample images

Tags: Deep Learning, Computer Vision, PyTorch, Binary Image Segmentation, Albumentations, Data Visualization

  • Created a model to segment skin cancers from images and create a mask to show the area of the cancer
  • Utilized a U-Net model from scratch to take an RGB image as input and output a black and white mask (cancer in the white)
  • Utilized Data Augmentation on the images and corresponding masks using albumentations to prevent overfitting
  • Utilized Early Stopping and a Learning Rate Scheduler to prevent overfitting
  • Achieved 0.96 pixel accuracy on the validation dataset
  • Evaluated the model by loss and pixel accuracy
  • Plotted predicted masks on the test images and compared to the ground truth

Tags: Deep Learning, Computer Vision, PyTorch, Multiclass Image Segmentation, Transfer Learning

  • Utilized a pretrained U-Net model and fine tuned it to segment an image of a road into the background, road signs, cars, markings, and road surface
  • Created a dictionary to map the RGB pixel values into different class for training and convert the classes of each pixel into RGB values for plotting
  • Utilized DiceLoss for the loss function, a Learning Rate Scheduler to get better results, and Early Stopping to prevent overfitting
  • Evaluated the model on loss and Intersection over Union (IOU)
  • Achieved 0.97 IOU on the validation dataset
  • Plotted predictions on the sample images with the color mapping and compared results to ground truth

Tags: Machine Learning, Deep Learning, Natural Language Processing, Scikit-Learn, Tensorflow

  • Created three models to classify descriptions into four labels (Household, Books, Electronics, and Clothing & Accessories)
  • Vectorized the descriptions using a Bag-of-Words approach
  • Utilized two classical machine learning models (Naive Bayes and SVM) to classify the descriptions and compared speed and accuracy
  • For a deep learning approach, the tensorflow tokenizer was used along with padding to process the text, followed by a custom model to output the label
  • The final results were that the Naive Bayes model took the least amount of time for 0.94 accuracy on the validation data and the Deep Learning model took the most with an acccuracy of 0.98 on the validation dataset

Tags: Machine Learning, Deep Learning, Natural Language Processing, Scikit-Learn, Tensorflow

  • Created six models to classify tweets by sentiment (Irrelevant, Negative, Neutral, Positive)
  • Vectorized the descriptions using a Bag-of-Words approach for the ML models
  • Utilized four classical ML models (Naive Bayes, SVM, Decision Tree, Random Forest, and XGBoost) and a Deep Learning model to compare speed/accuracy
  • For a deep learning approach, the tensorflow tokenizer was used along with padding to process the text, followed by a custom model to output the label, and evaluated the model by loss and accuracy
  • A final comparison showed that the Naive Bayes model was the fastest and had 0.83 validation accuracy, the SVM model took the longest with 0.94 validation accuracy, the XGBoost model performed the worst with 0.76 validation accuracy, and the deep learning model performed the best with 0.99 validation accuracy and took the third least amount of time

Tags: Data Science, Recommendation System, Natural Language Processing, Scikit-Learn, Exploratory Data Analysis

  • Created a simple content-based recommendation system to recommend games from the Steam game library
  • Performed Exploratory Data Analysis to view trends in prices, reviews, operating systems, and ratings
  • Vectorized the descriptions using CountVectorizer and calculating cosine similarities among the vectorized descriptions
  • Output returns the games with the top 5 cosine similarities to the description of one's favorite game

Tags: Deep Learning, Generative AI, GANs, PyTorch

  • Created a GAN to generate images of minifigurines
  • Utilized a generator to generate images (256 x 256 x 3) from a latent vector of dimension 128 and a discriminator to determine whether generated images are fake or not
  • Trained for 3000 epochs and balanced the loss values of both adversarial models
  • Plotted generated images from random latent vectors