2021 Python for Machine Learning & Data Science Masterclass
A project following along with the "2021 Python for Machine Learning & Data Science Masterclass" class by Jose Portilla on Udemy.
Note that the folder 'project' contains an actual flask server implementation that loads a random forest regressor model and does predictions over an API.
Progress
- 11/235
- 23/235
- 32/235
- 43/235
- 50/235
- 53/235
- 63/235
- 80/235
- 94/235
- 101/235
- 107/235
- 113/235
- 122/235
- 129/235
- 136/235
- 141/235
- 146/235 (62.1)
- 152/235 (64.6)
- 159/235 (67.6)
- 173/235 (73.6)
- 187/235 (79.5)
- 198/235 (84.2)
- 203/235 (86.3)
- 210/235 (89.3)
- 215/235 (91.4)
- 221/235 (94)
- 235/235 (100)
Notes and learnings from the tutorial
ML Pathway Overview
- Supervised learning = trying to predict an outcome
- Unsupervised learning = dicover patterns in data
NumPy
Many many Data Science libraries are built powered by NumPy. It is a library for creating N-dimensional arrays.
- NumPy structures look similar to python lists but they are much more efficient.
Pandas
Is a library for data analysis, uses a dataframe system built off NumPy.
- Has fantastic docs: https://pandas.pydata.org/docs/
What can we do with it?
Comes built in with tools for reading and writing data (or files)
- Can read/write directly to external data sources (databases + html tables)
- Can intelligently retreive data (to handle missing data and adjustment)
- the "Excel for python" but so much more than that.
- Only limited by how much RAM you have, no limit to size of opening files.
Series = one-dimensional ndarray with axis labels
- Allows arrays to have row labels.