/ml-analytics-portfolio

A collection of various ML and analytics mini projects.

Primary LanguageJupyter Notebook

Shumayl Asmawi - ML/Analytics Portfolio

To view this as a webpage, click here.

About me

My name is Shumayl Asmawi, and I am a Systems Engineer at HP Inc, where I develop data-driven software systems in the manufacturing sector.

This mini portfolio was created in 2021, and may not accurately reflect my current skills. You can visit my GitHub profile and my blog to view my latest works.

About the portfolio

The depth and scope of each project varies, they include (but not limited to):

  • Data cleaning
  • Data preprocessing
  • Exploratory data analysis
  • Feature engineering
  • Data visualization
  • ML/artificial neural network model training
  • Hyperparameter tuning
  • Predictive analytics
  • Model evaluation

Mini projects

Click any of the project title to visit the codes on GitHub. If GitHub fails to load any of the .ipynb files, you can use nbviewer to view the files by clicking here.

  • Implemented an ensemble of gradient boosting, random forests, and linear regression to predict traffic congestion.
  • Implemented an XGBoost model to predict the pass/fail yield for in-house line testing of over 1500 production entities using signals from over 500 sensors and process measurement points.
  • Obtained a great predictive performance of 0.876 (ROC-AUC).

Receiver Operating Characteristic (ROC) Curve

Vaccination Dashboard

  • Conducted an exploratory data analysis (EDA) to discover key analytics and diagnostics for a dataset of two solar power plants.
  • Constructed a simple linear regression model to predict the output of the solar power plants which is accurate up to around 700kW (root mean squared error).

Variance Between Actual Values and Predicted Values

Feature Correlation Matrix

AC Power Output Throughout the Day

  • Designed a multilayer perceptron (MLP) neural network model and an XGBoost model to predict the category of 100,000 different products given over 70 different mystery features of over 200,000 existing products for Kaggle's June Tabular Playground Competition.
  • Scored 1.77852 (competition winner scored 1.74370, multiclass log loss - lower is better) with predictions made by the XGBoost model.

MLP Model Architecture

  • Designed a multilayer perceptron (MLP) neural network model with tf.keras to predict the price of houses given information on location and various other aspects of over 20,000 houses with a great variance score of 0.767.

MLP Model Architecture

Coordinate Plot of House Prices

Variance Between Actual Values and Predicted Values

Correlation Between Living Space and House Price

  • Conducted a comprehensive exploratory data analysis with prompts from a lecture by Kevin Markham on a dataset of over 2000 recorded TED Talks to visualize underlying trends and patterns regarding the popularity, sentiments, and ratings of the TED events.

Views Analytics

Engagement Analytics

Sentiment Analytics

Number of TED Talks per Year

  • Implemented the K-Nearest Neighbors algorithm to predict the category of an object based on 10 mystery features of 1000 different objects with a precision of 83%.

KNN Decision Boundary

  • Trained a decision trees model and a random forest model to predict loan repayment given past information of an applicant. The decision trees model and the random forest model achieved 75% and 78% accuracies respectively.

Decision Tree Model

  • Implemented the Support Vector Machine (SVM) algorithm to predict between 3 different flower species given information on sepal and petal dimensions with an average accuracy of 95%.

SVM Model Performance

Multivariate Plot

Features Correlation

  • Created a logistic regression model that processed 1000 rows of user data to estimate whether or not a person will click an advertisement with a 93% accuracy.

Model Performance

Multivariate Plot

Features Correlation