Introduction

Introduction

This lesson summarizes the topics we'll be covering in section 08 and why they'll be important to you as a data scientist.

Objectives

You will be able to:

  • Understand and explain what is covered in this section
  • Understand and explain why the section will help you to become a data scientist

NumPy

We've already introduced the Pandas data science library, but in this section we're going to dig a little into NumPy - the library that Pandas leverages under the hood and that provides for computationally efficient mathematical transformations of data sets. If you thought some of the Pandas code took a while to run in previous sections, you wouldn't want to see how slow it would run without NumPy under the hood!

Combinatorics and Probability

We're then going to take a little time to "get our math on" with some basic probability. We're going to start with some basic set theory and look at how to operate on related sets using Python.

From there, we're going to use what we learned about sets to start to learn and apply some of the basic rules of probability to calculate various likelihoods.

Factorials and Permutations

Next we're going to dig into factorials, and how they can be used to calculate various permutations.

Combinations

We're then going to examine the difference between permutations and combinations we'll get some practice calculating combinations for everything from drawing letters from a bag to creating soccer teams for a tournament!

Bernoulli and Binomial Distributions

Finally, we're going to look at Bernoulli experiment and how the probabilities related to a series of independent Bernoulli events can be expressed using a Binomial distribution.

Summary

NumPy, Probability and Combinatorics fit well together. None of them are as attention grabbing as Convolutional Nerual Networks (don't worry - we'll get to those later in the course), but each one of them helps to provide the foundation on which most Python machine learning algorithms are built!