/100-Days-of-ML-Pt2

100 Day ML Challenge to learn and develop machine learning products. Since this is my second time performing this challenge, this time around I will be focusing more on the production enviroment rather than the concepts and theory behind ML/DL models. I will be placing heavy emphasis on the ML pipeline and the process of taking an ML model and applying into a real-world application.

Primary LanguageJupyter NotebookMIT LicenseMIT

LinkedIn Youtube

100-Days-of-ML-Pt2

Daily log to track my progress on the 100 days of ML code challenge.

Description

100 Day ML Challenge to learn and develop machine learning products. Since this is my second time performing this challenge, this time around I will be focusing more on the production enviroment rather than the concepts and theory behind ML/DL models. I will be placing heavy emphasis on the ML pipeline and the process of taking an ML model and applying into a real-world application.

Challenge Goals

  • Learn how to deploy ML models into production.
  • Learn about scripting for MLOps
  • Learn about the ML Pipeline
    • TFX Components
    • Orchestration tools
  • Learn more about privacy & protection for ML applications
  • Learn the CI/CD for ML
  • Learn about CUDA and get hands on experience
  • Learn more about deep reinforcement learning
    • Markov Chains
    • Stationary distribution probabilities
    • Markov Decision Processes
  • Learn more about generative learning
  • Learn about AWS microservices

[Additional]

  • Milti-modal Systems
  • Apache Flume
  • Apache Beam
  • Apache Airflow
  • Using Amdahl's Law
  • MySQL/Apache Spark
  • PostgreSQL
  • Optimizing/architecting software/hardware solutions for ML

Resources

Books

Deploy Machine Learning Models to Production With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform

Building Machine Learning Powered Applications Going from Idea to Products

Building Machine Learning Pipelines Automating Model Life Cycles with TensorFlow

Hands-On GPU Programming with Python and CUDA: Explore High-performance Parallel Computing with CUDA

Web Resources

@article{madewithml, title = "MLOps - Made With ML", author = "Goku Mohandas", url = "https://madewithml.com/courses/mlops/organization/" year = "2021", }

Youtube Log

This whole challenge will be documented on youtube during live streams. The link to the playlist: 100 Days of ML

Daily Goals

Day 1: Deploy a Linear Regression model using FLASK

  • Develop a web application using FLASK
  • Code a linear regression model
  • Deploy the trained model as a REST service

Day 2: Deploy a Linear Regression model using Streamlit

  • Read the section about streamlit
  • Create UI using streamlit
  • Deploy the trained model as a REST service
  • code LSTM model

Day 3: Deploy a Deep Learning model using Streamlit

  • Train the LSTM model
  • Create UI using streamlit

Day 4: Deploy a Deep Learning model using Streamlit

  • Deploy the trained model as a REST service
  • Read Chapter 4: ML Deployment using Docker

Day 5: Deploy ML model using Docker

  • Create a dockerfile for Flask App
  • Create Docker image
  • Push our Docker image to DockerHub

Day 6: ML Deployment using Kubernetes

  • Read chapter 5: ML Deployment using Kubernetes
  • Create GCP Project
  • Enable and utilize the Kubernetes Engine API on GCP

Day 7: Scripting

  • Read topics under scripting from MadeWithML
  • Apply learning by adding it to current projects

Day 8: Reading Building ML Pipelines

  • Read Ch1: Introduction
  • Read Ch2: Introduction to TensorFlow Extended

Day 9: Reading Building ML Pipelines

  • Read Ch2: Introduction to TensorFlow Extended
  • Read Ch3: Data Ingestion

Day 10: TFX Setup

  • Download and setup TFX
  • Execute TFX Data Ingestion examples

Day 11: TFX Data Ingestion

Day 12: TFX Data Ingestion

Day 13: Reading Building ML Pipelines

  • Read Ch4: Data Validation
  • Read online resources on TFX Data Validation

Day 14: Data Validation

  • Read online resources on TFX Data Validation
  • Execute example code for TFVD

Day 15: Reading Building ML Pipelines

  • Read Ch5: Data Preprocessing
  • Read online resources on TFX Data Preprocessing

Day 16: Data Preprocessing

  • Read more about feature engineering
  • Feature engineering vs ML engineering
  • Execute example code for TF Transform

Day 17: MLOps Production

Day 18: Reading Building ML Pipelines

  • Read Ch 6: Model Training
  • Read online resources on TFX Trainer Component

Day 19: Reading Building ML Pipelines

  • Read Ch 7: Model Analysis and Validation
  • Read online resources on TF Model Analysis

Day 20: Reading Building ML Pipelines

  • Read Ch 8: Model Deployment with TensorFlow Serving

Day 21: Reading Building ML Pipelines

  • Continue reading Ch8: Model Deployment with TensorFlow Serving
  • Read online resources on TF Serving

Day 22: Plan out an ML Product to Build

  • Look into simple ML models to deploy
  • Create the architecture for the pipeline
  • Setup/Decide on github project organization
  • Choose how to build front-end
  • What orchestration tool to use?

Day 23: Plan out an ML Product to Build

  • How to integrate CI/CD into project
  • FLASK Web deployment vs Model Server
  • Create order of events
  • Look into setting up GitHub Project

Day 24: Build Dockerfile for ML Project

  • List out dependencies
  • Create Dockerfile
  • Build Docker Image
  • Run Docker container
  • Check if GPU is being used

Day 25: Build Dockerfile for ML Project

  • Troubleshoot Dockerfile
  • Build Docker Image
  • Update README.md

Day 26: Setup TFX Pipeline Architecture

  • Setup TFX pipeline
  • Design TFX architecture

Day 27: TFX Pipeline

  • Understand the template code
  • Clear out the template files and rewrite the pipeline

Day 28: Import Data

  • Change permissions for files in /ml
  • Add the IMDB datset into /data
  • Convert dataset to desired format

Day 29: Add Formating/Styling

  • Setup formating/styling
  • Black & flake8
  • Setup github actions

Day 30: Data Validation

  • Fix features.py
  • Generate statistics
  • Visualize statistics
  • Infer schema

Day 31: Troubleshooting Data Ingestion/Validation

  • Build updated docker image
  • Figure out why example gen is not loading

Day 32: Data Preprocessing

  • Change preprocessing.py
  • Test Preprocessing

Day 33: Continue Data Preprocessing

  • Change preprocessing.py
  • Test Preprocessing

Day 34: Continue Data Preprocessing

  • Test Preprocessing
  • Add TFX Transform to tfx pipeline components

Day 35: Model Training

  • Prototype the tfx trainer component in jupyter
  • Modify model.py

Day 36: Continue Model Training

  • Modify model.py
  • Add tfx trainer component to pipeline
  • Test pipeline

Day 37: Read Building ML Pipelines

  • Read ch: 12 of Building Ml Piplines - Kubeflow pipeline

Day 38: Reading Building ML Pipelines

Day 39: Debug Model Training

  • Figure out how the tfx IMDB example in tensorflow tutorial works
  • Understand the use for custom tfx components in the tutorial
  • Understand the preprocessing required for the model using the IMDB dataset

Day 40: Find resources for learning CUDA

  • Find a book to read for CUDA
  • Search for alternative resources online

Day 41: Start reading GPU Programming with CUDA

  • Download a digital version of the book
  • Start the introductory chapters
  • Using Amdahl's Law

Day 42: Start reading GPU Programming with CUDA

  • Learn about Mandelbrot set
  • What are profilers? cProfile module
  • Setting up GPU programming environment in Linux

Day 43: Scouting Internships

  • Search for research internships
  • Create a spredsheet with all the important dates listed
  • Create and organize a small google document with all relavent links and information

Day 44: Setup development enviroment for CUDA

  • Test on local environment
  • Figure out depencies
  • Create Dockerfile
  • Build Docker image
  • Test docker container with all the dependencies for programming CUDA using Python

Day 45: Read GPU Programming with CUDA

  • Read chapter 3: Getting Started with PyCUDA
  • CPU vs GPU timing
  • Parallelizing the mandelbrot set

Day 46: Troubleshoot TFX Transform component

  • Re-implement the tokenization
  • Use keras TextVectorizer
  • Build sentence sequences

Day 47: Mandelbrot Set

  • Code using sequential utilizing CPU
  • Parallelize the code to run on GPU

Day 48: Read GPU Programming with CUDA

  • Funcitonal programming
  • Parallel scan and reduction kernel basics
  • Kernels, Threads, Blocks, and Grids

Day 49: Write article on 100 Days of ML Pt2

  • Talk about all of the knowledge I've gained so far
  • Current objectives
  • Future tasks

Day 50: Review medium article on 100 Days of ML Pt2

  • Finish up future tasks
  • Include GitHub
  • Link website

Day 51: Read ML Applications

  • Start with introduction
  • Read chapter 1

Day 52: Read ML Applications

  • Continue chapter 1 - From Product Goal to ML Framing
  • Start chapter 2 - Create a Plan
  • Start reading ML Paper every week

Day 53: Read ML Applications

  • Finish chapter 2 - Create a Plan
  • Review Part I. Find the Correct ML Approach
  • Read more about model metrics

Day 54: Implement learning from chapters 1 and 2

  • Find an existing problem
  • Determine if it can be solved using ML
  • Look for datasets and determine what model would work

Day 55: Read ML Applications

  • Read Chapter 3 - Build Your First End-to-End Pipeline
  • Read Chapter 4 - Aquire an Initial Dataset

Day 56: Reading ML Applications

  • Continue Chapter 4
  • Review Part II. Build a Working Pipeline

Day 57: Setting up Paper-a-Week Repo

  • Setup GitHub repo for paper-a-week challenge
  • Decide on a list of papers to read
  • Start reading the first paper

Day 58: Reading ML Applications

  • Read Part III - Iterate on Models
  • Read Chapter 5 - Train and Evaluate Your Model

Day 59: Reading ML Applications

  • Finish reading Ch 5

Day 60: Reading ML Applications

  • Read Chapter 6 - Debug Your ML Problems

Day 61: Reading ML Applications

  • Finish reading Chapter 6
  • Think of ways to test preprocessing in ML-Pipelines project

Day 62: Prototype ML Pipelines Project

  • Build the model on Jupyter Notebook
  • Finish Data Ingestion

Day 63: ML Project pt 17 - Prototype

  • Finish filtering HTML tags from string

Day 64: ML Project pt 18 - Prototype

  • Vectorize the filtered string

Day 65: ML Project pt 19 - Prototype

  • Figure out how to create embeddings
  • Create embeddings

Day 66: ML Project pt 20 - Prototype

  • Analyze processed data
  • What is the model doing?

Day 67: ML Project pt 21 - Prototype

  • How to build the DL model?
  • Build the DL model
  • Run tests

Day 68: Reading Data Science paper

  • Read and take notes on Data preprocessing - Tidy data - by Hadley Wickham

Day 69: Reading Tidy Data

  • Continue with section 3
  • Revise sections 1-3

Day 70: Reading Tidy Data

  • Continue with section 4
  • Finished reading Tidy Data

Day 71: Reading ML Applications

  • Start Chapter 7 - Using Classifiers for Writing Recommendations

Day 72: Reading ML Applications

  • Finish Chapter 7
  • Review Part III - Iterate on Models

Day 73: Paper a Week

  • Read Statistical Modeling: The Two Cultures - by Leo Breiman

Day 74: Paper a Week

  • Finished Section 3 and 4

Day 75: Paper a Week

  • Read section 5 - the use of data models

Day 76: Paper a Week

  • Start section 6 - the limitations of data models

Day 77: Paper a Week

  • Start section 8 - RASHOMON AND THE MULTIPLICITY OF GOOD MODELS
  • Continue reading
  • Day 78: Paper a Week

    • Read section 9

    Day 79: Prepare for ML Confrence

    • Fix info on the slides
    • Make up a presentation
    • Add/subtract information

    Day 80: Paper a Week

    • Read section 10

    Day 81: Paper a Week

    • Read section 11
    • Read section 12
    • Finished Statistical Modeling: The Two Cultures by Leo Breiman

    Day 82: Paper a Week

    • Start reading A Study in Rashomon Curves and Volumes: A New Perspective on Generalization and Model Simplicity in Machine Learning (Semenova et al)
    • Update the Paper a Week repository with annotated papers
    • Checked out Ishan Misra and Yann LeCun's blog post on Self-supervised learning

    Day 83: Self-supervised learning

    Day 84: Paper a Week

    • Read Paper a Week: Rashomon Curves and Volumes

    Day 85: Paper a Week

    • Explore the different statistical concepts introduced in Rashomon Curves & Volumes
    • Continue reading Rashomon Curves and Volumes

    Day 86: AI/ML Research

    Day 87: EfficientNet

    Day 88: Yolo

    Day 89: Implementing EfficientNet

    • Lay out the architecture of EfficientNet
    • Figure out if its possible to code it using TensorFlow
    • Start implementation

    Day 90: Read MobileNetV2 & MnasNet

    • Understand what MBConv blocks are
    • Learn about inverted residual convolution

    Day 91: Read MobileNetV2

    • Get an understanding of the Bottleneck residual block
    • What are dwise layers?

    Day 92: Implement EfficientNet

    • Start by implementing a simple convblock
    • Design and code the MBConv block

    Day 93: Read PointNet

    • Learn about the architecture of PointNet
    • Continue with the berkely reading list

    Day 94: Read Yann LeCun's paper on Deep Learning

    • Continue reading the paper

    Day 95: Semi Supervised Learning

    Day 96: Co-Training Algorithm

    Day 97: Multi-Modal

    • Learn more about the functionality of multi-modal models

    Day 98: SSL and Weakly Supervised Learning

    • Learn about the difference between SSL and WSL

    Day 99: Write article about 100 Days of ML

    • Start on the medium article

    Day 100: Challenge Completed

    • Finish the medium article