2021 Python for Machine Learning & Data Science Masterclass

A project following along with the "2021 Python for Machine Learning & Data Science Masterclass" class by Jose Portilla on Udemy.

Note that the folder 'project' contains an actual flask server implementation that loads a random forest regressor model and does predictions over an API.

Progress

11/235
23/235
32/235
43/235
50/235
53/235
63/235
80/235
94/235
101/235
107/235
113/235
122/235
129/235
136/235
141/235
146/235 (62.1)
152/235 (64.6)
159/235 (67.6)
173/235 (73.6)
187/235 (79.5)
198/235 (84.2)
203/235 (86.3)
210/235 (89.3)
215/235 (91.4)
221/235 (94)
235/235 (100)

Notes and learnings from the tutorial

ML Pathway Overview

Supervised learning = trying to predict an outcome
Unsupervised learning = dicover patterns in data

NumPy

Many many Data Science libraries are built powered by NumPy. It is a library for creating N-dimensional arrays.

NumPy structures look similar to python lists but they are much more efficient.

Pandas

Is a library for data analysis, uses a dataframe system built off NumPy.

Has fantastic docs: https://pandas.pydata.org/docs/

What can we do with it?

Comes built in with tools for reading and writing data (or files)

Can read/write directly to external data sources (databases + html tables)
Can intelligently retreive data (to handle missing data and adjustment)
the "Excel for python" but so much more than that.
Only limited by how much RAM you have, no limit to size of opening files.

Series = one-dimensional ndarray with axis labels

Allows arrays to have row labels.