machine-learning-engineer-roadmap

Advanced Targets

  • Kernel Programming (C/C++/CUDA) for optimization
  • optimization techniques for model performance enhancements, including quantization, distillation, sparsity, streaming, and caching.
  • Integrate and tailor frameworks like PyTorch, TensorFlow, DeepSpeed, and FSDP to advance super-fast model training and inference.
  • Advance the deployment infrastructure with MLOps frameworks such as KubeFlow, MosaicML, Anyscale, and Terraform, ensuring robust development and deployment cycles.
  • AI training job scheduling, orchestration, and management via SLURM and Kubeflow.

ML in the real world is about software engineering and Applied Maths

[1] Basic SWE

  • Programming Languages (Python)
  • Object Oriented Programming
  • Data Structures and Algorithms
  • Databases (Postgresql, Mongodb, s3/blob stores)
  • Frameworks (pandas, numpy, sklearn, PyTorch, PySpark)
  • Scripting and CLI
  • Styling, Testing
  • CI/CD
  • Automation Jobs (Airflow)
  • Services (APIs, Container Deployments)
  • Cloud (AWS/GCP/Azure)

[2.1] Basic ML

  • Fundamentals

    • What is ML?
      • Input and Output
      • ML Categories
    • ML Essentials
      • Linear Algebra
      • Probability & Statistics
      • Calculus (Gradient Descent & Backpropagation)
      • Feature Engineering
      • Loss Functions
      • Evaluation Metrics
    • Model Evaluation and Selection
      • Bias-Variance Trade-Off
      • Model Complexity and Overfitting
      • Regularization
      • Interpretability & Explainability
    • Model Training
      • Cross Validation
      • Bootstrapping and Bagging
      • Hyperparameter Tuning
      • Training Times and Learning Curves
  • Supervised ML

    • Naive Bayes
    • K-Nearest Neighbors
    • Decision Tree
    • Linear Regression
    • Logistic Regression
    • Support Vector Machine
  • Unsupervised ML

    • KMeans Clustering
    • Singular Value Decomposition
    • Gaussian Mixture Model

[2.2] Deep Learning

  • Neural Networks
  • Convolution Neural Networks
  • Generative Adversarial Networks
  • Recurrent Neural Networks
  • Large Language Models (Transformers)
  • Diffusion Models
  • Graph Neural Networks

[2.3] Recommendation Systems

  • Collaborative and Content-Based Filtering
  • Learning to Rank

[2.4] Reinforcement Learning

  • Value-Based Method
  • Policy Based Method
  • Actor-Critic

[3] Large Scale Practical ML

  • Data Support / Data Engineering
    • Data Ingestion
    • Data Storage
    • Data Processing
    • Processing Orchestration
  • Model Development
    • Workspaces
    • Distributed Model Training
    • Model Validation (Offline)
  • Model Production
    • Serving and Hosting
    • Deployment and Online Evaluation
      • AB Testing (Frequentist & Bayesian)
      • Multi-Armed Bandit
      • Impact Estimation

[4] ML System Design

  • Design an ML Platform
  • Design a Facebook Photo Tagging System
  • Design Amazon Alexa
  • Design Pinterest Visual Search System
  • Design Google Street View Blurring System
  • Design YouTube Video Search
  • Design Harmful Content Detection
  • Design a Fake News Detector
  • Design Video Recommendation System
  • Design Event Recommendation System
  • Design Ad Click Prediction on Social Platforms
  • Design Similar Listings on Vacation Rental Platforms
  • Design a Personalized News Feed
  • Design Linkedin's People You May Know
  • Design FedEx's Damage Shipment Prediction
  • Design General Conversation (Chatgpt) System
  • Design a Handwritten OCR System
  • Design a Real-Time Financial Advisor System
  • Design zomato's Kitchen Preparation Time System