
collections of data science, machine learning and data visualization projects with pandas, sklearn, matplotlib, tensorflow2, Keras, various ML algorithms like random forest classifier, boosting, etc

Data Science, Machine Learning & Visualization Dojo

Collections of Data Science & ML projects and dojo where I practice Data Science, Machine Learning, Deep Learning and Data Visualization related skills, theories, probability, statistics, etc.

Built with

Machine Learing, Deep Learning, Data Science libraries

  • NumPy - package for scientific computing with Python
  • Pandas - fast, powerful, flexible and easy to use open source data analysis and manipulation tool
  • Pandas Profiling - generate reports from dataframe
  • Geo Pandas - support for geographic data to pandas objects.
  • Scikit-learn - Simple and efficient tools for predictive data analysis
  • TensorFlow - An end-to-end open source machine learning platform
  • Keras - Deep Learning framework
  • NLTK - Natural Language Toolkit

Data Visualization libraries

  • Matplotlib - a comprehensive library for creating static, animated, and interactive visualizations in Python
  • Seaborn - statistical data visualization
  • Bokeh - interactive visualization library for modern web browsers
  • Plotly - The front-end for ML and data science models
  • Cufflinks - Productivity Tools for Plotly + Pandas

Turning into Web applications

  • Streamlit - The fastest way to build and share data apps
  • Flask - a micro web framework written in Python


  • Apache Spark - a unified analytics engine for large-scale data processing.
  • Spark with pyspark - PySpark is the collaboration of Apache Spark and Python
  • Databricks - Unified Data Analytics Platform - One cloud platform for massive scale data engineering and collaborative data science.

Tools and Datasources


Data Analysis and Visualization Capstone project from Machine Learning and Datascience Masterclass Course.

  • This is the data behind the story Be Suspicious Of Online Movie Ratings, Especially Fandango’s
  • using data from 538
  • If you are planning on going out to see a movie, how well can you trust online reviews and ratings? Especially if the same company showing the rating also makes money by selling movie tickets.
  • Do they have a bias towards rating movies higher than they should be rated?
  • etc..
  • This project is to build a machine learning model to predict whether or not a customer will Churn or not.
  • Includes cohort analysis based on Telco subsriber's contract type, etc.

Machine Learning & Data Science Masterclass Projects

Deep Learning Projects

Project from Complete Machine Learning and Data Science - Zero to Mastery course.

Data Analysis and Visualization Capstone project from Data Science and Machine Learning Bootcamp Course.

  • analyzing 911 calls data from kaggle
  • top 5 zips code for 911 calls
  • top 5 townships for 911 calls
  • most common Reason for a 911
  • different types of visualizations based on the findings
  • etc..
  • Machine learning app using streamlit, for building a regression model using the Random Forest algorithm.

Data Analysis and Visualization

  • Data Visualization with Python - Project: Data analysis and Data Visualization using Pandas, Matplotlib for Countries's GDP, Life Expectancy comparison across continents, GDP per Capita Relative Growth, Population Reative Growth comparison etc.
  • Fuel Economy Case Study - Project: Analyzing Fuel Economy Data provied by EPA for distributions of greenhouse gas score, combined mpg in 2008 and 2018, correlation between displacement and combined mpg ,greenhouse gas score and combined mpg. Are more unique models using alternative fuels in 2018 compared to 2008? By how much? How much have vehicle classes improved in fuel economy (increased in mpg)? What are the characteristics of SmartWay vehicles? Have they changed over time? (mpg, greenhouse gas) What features are associated with better fuel economy (mpg)? What is the top vehicle which improved the most in terms of combined mpg from 2008 to 2018?
  • Wine Quality Case Study - Project: Analyzing wine data for the following points for wine businesses to model better wine. Is a certain type of wine (red or white) associated with higher quality? What level of acidity (pH value) receives the highest average rating? Do wines with higher alcoholic content receive better ratings? Do sweeter wines (more residual sugar) receive better ratings? White Vs Red Wine Proportions by Color & Quality
  • TV, Halftime Shows, and the Big Game - Project: Analyzing Superbowls data and answering questions like - What are the most extreme game outcomes? How does the game affect television viewership? How have viewership, TV ratings, and ad cost evolved over time? Who are the most prolific musicians in terms of halftime show performances?
  • Weather Trend - Project: Analyzing Global weather trends, Singapore weather trends, Comparing Global vs Singapore 10 years Moving Average trends
  • Real-time Insights from Social Media Data - Project: Analyzing Twitter data and answering questions like: What are gobal trend and local trends?, finding the common trends
  • frequency analysis on tweets and hashtags, etc.
  • Statistics From Stock Data: Analyzing google, apple and amzon stock prices and checking the rolling mean.
  • Android Play Store App Data Analysis - Project: Analyzing andriod play store data and answering questions like - How many apps are paid? How much money are they making? When were these apps released?

Mini ML Projects


  • Week 01 - Exploring a Larger Dataset
  • Week 02 - Augmentation: A technique to avoid overfitting
  • Week 03 - Transfer Learning
  • Week 04 - Multiclass Classification
  • Week 01 - A New Programming Paradigm
  • Week 02 - Introduction to Computer Vision
  • Week 03 - Enhancing Vision with CNN
  • Week 04 - Using Real-world images
  • The Fundamentals of Machine Learning
  • The Machine Learning Landscape
  • End-to-End Machine Learning Project

Advancing Machine Learning & Data Science Journey - (In Progress)

To skill up my ML & DS related skills in specific areas and topics:

  • 01.ML Basic
  • 02.Intro to Feature Engineering
  • 03.Explore Data
  • 04.Create and Clean Features
  • 05.Prepare Features for Modelling
  • 06.Compare and Evaluate Models
  • 01.Review of Foundation
  • 02.Logistic Regression
  • 03.Support Vector Machine
  • 04.Multi-layer Perceptron
  • 05.Random Forest
  • 06.Boosting
  • 07.Final Model Selection and Evaluation
  • 01.ML Basic
  • 02.Exploratory Data Analysis and Data Cleaning
  • 03.Evaluation - Measuring Success
  • 04.Optimizing a Model
  • 05.End to End Pipeline
  • Assuming Data is good to go
  • Neglecting to consult subject matter experts
  • Overtiffing your models
  • Not standardizing your data
  • Focusing on Wrong Factors
  • Data Leakage
  • Forgetting traditional statistics tools
  • Assuming Deployment is a breeze
  • Assuming Machine Learning is the answer
  • Developing in a silo
  • Not treating for imbalanced sampling
  • Interpreting your coefficients without properly treating for multicollinearity
  • Evaluating by accuracy alone
  • Giving overly technical presentations
  • Python
  • Pandas
  • Data Cleaning
  • Introduction to Machine Learning
  • Machine Learning Intermediate
  • Feature Engineering
  • Machine Learning Explaniability
  • Data Visualization
  • Intro to Deep Learning
  • Intro to Game AI and Reinforcement Learning
  • Natural Language Processing
  • Micro-challenges
  • Computer Vision
  • Intro to SQL
  • Advanced SQL
  • ML Crash Course
  • Problem Framing
  • Data Prep
  • Clustering
  • Recommendation
  • Testing and Debugging
  • GANs

Deep Learning , Machine Learning, AI & Data Science

Apache Spark & PySpark

Data Analysis, Manipulation & Data Visualization

  • 01.Plotly Foundation
  • 02.Using Scenarios with Plotly
  • 03.Creating Visualization with Plotly
    • Basic Charts (Line Graph, Bar Graph, Scatter Plot, Bubble Chart)
    • Statstical Charts (Histogram, Distribution, Scatter Matrix, Correlation Matrix with Imshow, Color Scales and Sequences)
    • 3D Plotting (3D Scatter Plot, 3D Bubble Chart, 3D Line Chart, 3D Surface Plot)
    • Mapping (Scatter Plot on Map, Choropleth, Mapbox and Geopandas)
    • Sunburst Chart
    • Snakey Chart
  • 04.Adding Interactivity with Plotly
  • Linear Regression Analysis
  • Multi Regression Analysis
  • Pratical Statistics
  • Excel Data Manipulation, Analysis and Visualization

Topics include:

  • Set theory, including Venn diagrams
  • Properties of the real number line
  • etc


This project is licensed under the MIT License - see the LICENSE.md file for details