Section Recap

Introduction

This short lesson summarizes key takeaways from this section.

Objectives

You will be able to:

  • Understand and explain what was covered in this section
  • Understand and explain why this section will help you become a data scientist

Key Takeaways

The key takeaways from this section include:

  • Probably Approximately Correct (PAC) learning theory provides a mathematically rigorous definition of what machine learning is
  • The PAC is a learning model which is characterized by learning from examples
  • Decision trees can be used for both categorization and regression tasks
  • They are a powerful and interpretable technique for many machine learning problems (especially when combined with ensemble methods)
  • Decision trees are a form of Directed Acyclic Graph (DAG) - you traverse them in a specified direction, and there are no "loops" in the graphs to go backwards
  • Algorithms for generating decision trees are designed to maximize the information gain from each split
  • A popular algorithm for generating decision trees is ID3 - the Iterative Dichotomiser 3 algorithm
  • There are a range of pruning hyperparameters for decision trees to reduce overfitting - including maximum depth, minimum samples leaf with split, minimum leaf sample size, maximum lead nodes and maximum features
  • CART (Classification and Regression Trees) trees can be used for regression tasks