2021 Python for Machine Learning & Data Science Masterclass

A project following along with the "2021 Python for Machine Learning & Data Science Masterclass" class by Jose Portilla on Udemy.

Note that the folder 'project' contains an actual flask server implementation that loads a random forest regressor model and does predictions over an API.

Progress

  • 11/235
  • 23/235
  • 32/235
  • 43/235
  • 50/235
  • 53/235
  • 63/235
  • 80/235
  • 94/235
  • 101/235
  • 107/235
  • 113/235
  • 122/235
  • 129/235
  • 136/235
  • 141/235
  • 146/235 (62.1)
  • 152/235 (64.6)
  • 159/235 (67.6)
  • 173/235 (73.6)
  • 187/235 (79.5)
  • 198/235 (84.2)
  • 203/235 (86.3)
  • 210/235 (89.3)
  • 215/235 (91.4)
  • 221/235 (94)
  • 235/235 (100)

Notes and learnings from the tutorial

ML Pathway Overview

  • Supervised learning = trying to predict an outcome
  • Unsupervised learning = dicover patterns in data

NumPy

Many many Data Science libraries are built powered by NumPy. It is a library for creating N-dimensional arrays.

  • NumPy structures look similar to python lists but they are much more efficient.

Pandas

Is a library for data analysis, uses a dataframe system built off NumPy.

What can we do with it?

Comes built in with tools for reading and writing data (or files)

  • Can read/write directly to external data sources (databases + html tables)
  • Can intelligently retreive data (to handle missing data and adjustment)
  • the "Excel for python" but so much more than that.
  • Only limited by how much RAM you have, no limit to size of opening files.

Series = one-dimensional ndarray with axis labels

  • Allows arrays to have row labels.