/mlops

Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Federal University of Rio Grande do Norte

Technology Center

Department of Computer Engineering and Automation

Machine Learning Based Systems Design

References

  • ๐Ÿ“š Noah Gift, Alfredo Deza. Practical MLOps: Operationalizing Machine Learning Models [Link]
  • ๐Ÿ“š Chip Huyen. Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications. [Link]
  • ๐Ÿ“š Hannes Hapke, Catherine Nelson. Building Machine Learning Pipelines. [Link]
  • ๐Ÿ“š Mariano Anaya. Clean Code in Python [Link]
  • ๐Ÿ“š Aurรฉlien Gรฉron. Hands on Machine Learning with Scikit-Learn, Keras and TensorFlow. [Link]
  • ๐Ÿคœ Dataquest Academic Program [Link]
  • ๐Ÿ˜ƒ CS329S - ML Systems Design [Link]
  • ๐ŸŽฏ Machine Learning Operations [Link]

Lessons

Week 01: Course Outline Open in PDF

  • Git and Version Control Open in Dataquest
    • You'll learn how to: a) organize your code using version control, b) resolve conflicts in version control, c) employ Git and Github to collaborate with others.
    • ๐Ÿ‘Š U1T1: guided project + getting a git repository.

Week 02: CLI fundamentals

  • Elements of the Command Line Open in Dataquest
    • You'll learn how to: a) employ the command line for Data Science, b) modify the behavior of commands with options, c) employ glob patterns and wildcards, d) define Important command line concepts, e) navigate he filesystem, f) manage users and permissions.
  • Text Processing in the Command Line Open in Dataquest
    • You'll learn how to: a) read and explore documentation, b) perform basic text processing, c) redirect and pipe output, d) inspect files, e) define different kinds of output, f) employ streams and file descriptors.
  • ๐Ÿ”  U1T2: working with command line.

Week 03 - Clean Code Principles for Data Science and Machine Learning Open in PDF

  • Outline Open in Loom
  • Coding Best Practices Open in Loom
  • Writing Clean Code Open in Loom
  • Refactoring Code Open in Loom
  • Efficient Code Open in Loom
  • Documentation Open in Loom
  • Python Code Quality Authority (PCQA) - pycodestyle Open in Loom
  • PCQA - pylint Open in Loom
  • PCQA - autopep8 Open in Loom
  • PCQA - nbQA Open in Loom
  • โ–ถ๏ธ Hands on
    • ๐Ÿ’พ Datasets [Link]
    • Writting Clean Code Jupyter
    • Exercise 01 Jupyter
    • Exercise 02 Jupyter
    • Exercise 03 Jupyter
    • Using pycodestyle Jupyter
    • Using pylint - script Python refactored script Python
    • Functions: Advanced - Best practices for writing functions Open in Dataquest

Week 04 Production Ready Code Open in PDF

  • Outline Open in Loom
  • Catching Errors Open in Loom
  • Testing and Data Science Open in Loom
  • A brief introduction about pytest Open in Loom
  • Logging Open in Loom
  • Case study: testing and logging Open in Loom
  • Model Drift Open in Loom
  • Hands on
    • Production ready code Jupyter
    • Data Visualization Fundamentals Open in Dataquest
      • You will learn how to: a) how to use data visualization to explore data and b) how and when to use the most common plots.
    • Storytelling Data Visualization and Information Design Open in Dataquest
      • You will learn how to: a) Create graphs using information design principles, b) create narrative data visualizations using Matplotlib, c) create visual patterns using Gestalt principles, d) control attention using pre-attentive attributes and e) employ Matplotlib's built-in styles.

Week 05 Building a Data Pipeline Open in Dataquest

  • Define functional programming
  • Write a robust data pipeline with a scheduler in Python
  • Define advanced Python concepts such as closures and decorators

Week 06 Building a Reproducible Model Workflow Open in PDF

  • Outline Open in Loom
  • Business Reflections Open in Loom
  • Introduction to MLOps Open in Loom
  • A brief history of MLOps and Tools Open in Loom
  • Tools and environment installation Open in Loom Jupyter
  • Tools and environment installation cont. Open in Loom Script
  • Machine Learning Pipelines Open in Loom
  • Machine Learning Pipelines - Command Line Interface Open in Loom Python
  • Versioning Data and Artifacts Open in Loom Jupyter
  • Guided Exercise - CLI + Weights and Biases Open in Loom Zip
  • MLflow Projects Open in Loom
  • Introduction to YAML Open in Loom Jupyter
  • Guided Exercise - Build a MLflow component Open in Loom Zip
  • Linking together the components MLflow + Hydra Open in Loom
  • Guided Exercise - Multiples MLflow components + Hydra Open in Loom Zip
  • Additional Material Open in Dataquest
    • Context Managers
    • Introduction to Decorators
    • Decorators: advanced

Week 07 Building a Reproducible Model Workflow Cont. - Introduction to Machine Learning Open in PDF

  • Outline Open in Loom
  • What is Machine Learning? Open in Loom
  • Machine Learning Types Open in Loom
  • Variables, Pipeline, Controlling Chaos Open in Loom
  • Data Segregation - train, dev and test sets Open in Loom
  • Bias vs Variance Open in Loom
  • ๐Ÿ‡ Machine Learning Fundamentals Open in Dataquest

Week 08 Building a Reproducible Model Workflow Cont. - ETL, Data Checks, Data Segregation Open in PDF

  • Outline Open in Loom
  • Extract, Transform, Load (ETL)
    • Exploratory Data Analysis (EDA) Video
      • Part I Open in Loom
      • Part II Open in Loom
      • Part III Open in Loom
      • Part IV Open in Loom
      • Part V Open in Loom
      • Hands on Jupyter
    • Preprocessing Open in Loom Jupyter
  • Data Segregation Open in Loom Jupyter
  • Data Checks
    • Data Validation Open in Loom Jupyter
    • Deterministic Test Open in Loom Jupyter
    • Non-Deterministic Test Open in Loom Jupyter
    • Multiple Hypothesis Testing Open in Loom Jupyter
    • Multiple Hypothesis Testing Using MLFlow Open in Loom Jupyter
    • Multiple Hypothesis Testing Using Parameters in PyTest Open in Loom Jupyter

Week 09 Building a Reproducible Model Workflow Cont. - Train, Validation and Experiment Tracking Open in PDF

  • Outline Open in Loom
  • A brief review Open in Loom
  • Decision Trees
    • Introduction Open in Loom
    • Mathematical Foundations Open in Loom
  • Evaluation Metrics
    • How to choose an evaluation metric? Open in Loom
    • Threshold metrics Open in Loom
    • Ranking metrics Open in Loom
  • Implementing Pipelines
    • MLOps Level 0 with Pipeline incorporating train
      • Part I Open in Loom
      • Part II Open in Loom
      • Part III Open in Loom
      • Source-Code Jupyter
    • MLOps Level 0 with Pipeline incorporating train and preprocessing
      • Part I Open in Loom
      • Part II Open in Loom
      • Part III Open in Loom
      • Source-Code Jupyter
    • MLOps Level 1 with Pipeline incorporating train and preprocessing
      • Part I Open in Loom
      • Part II Open in Loom
      • Source-Code Jupyter
    • MLOps Level 1 with Pipeline and Hyper-parameter Tuning
      • Part I Open in Loom
      • Part II Open in Loom
      • Part III Open in Loom
      • Part IV Open in Loom
      • Source-Code Jupyter
    • Test evaluation
      • Part I Open in Loom
      • Source-Code Jupyter

Week 10 Building a Reproducible Model Workflow Cont. - Final Pipeline, Release and Deploy Open in PDF

  • Outline Open in Loom
  • Final Pipeline
    • Big picture of the final pipeline Open in Loom
    • All together
      • Part I Open in Loom
      • Part II Open in Loom
      • Source-Code Jupyter
  • Release for reproducibility
    • Create a GitHub repository for the final pipeline Open in Loom Jupyter
    • Semantic versioning and remote execution Open in Loom Jupyter
  • Deployment
    • Deploy with MLflow Open in Loom Jupyter