With the advent of Machine Learning and Big Data, data scientists who can create large-scale data-driven solutions are in high demand. By taking this four week compressed lesson, you will be able to utilize the python programming language in order to analyze both small and large data sets while building and evaluating your very own machine learning models.
About the Instructor:
Sinan Ozdemir
is a lecturer of Business, Mathematics and Computer Science at The Johns Hopkins University. Sinan is also a co-Founder/CTO of Tier5 and Kovani, two data science companies based in Baltimore. Sinan is an experienced teacher and entrepreneur. Follow him!
Day | Part | Topic |
---|---|---|
4/15 | 1 | Data Exploration with Pandas |
4/22 | 2 | Intro to Machine Learning |
4/29 | 3 | Model Evaluation and Metrics |
5/6 | 4 | Building a Model using Titanic Survival Data |
Required Pre-Reqs for ANY of the Parts:
- Download the Anacondas Distribution of Python
- Prepare to learn the glory that is Data Science!
- Anyone who is not feeling up on their python or coding skills in general should check out this resource to practice
In this session our objective is to learn the basics of python and how to use python's pandas to explore data sets.
At the end of the Class Students Will Be Able To:
- Use a module in python called Pandas in order to explore small and large data sets
- Prepare data and prefrom necessary pre-processing steps
Recommended Prereqs:
- Ability to write and read code
- Background in Python/R is preferred but not required
Agenda:
Homework
-
[Practice Problems for next time!] (homework/01_pandas_homework.py)
- The Data was taken from this 538 article
In this session we will go over the interesting subject of machine learning and build our first two models using the python package sci-kit learn.
At the end of the Class Students Will Be Able To:
- Understand at a fundamental level what machine learning is and how it is used in practice.
- Use a module in python called Sci-kit learn in order to build and evaluate machine learning models
- Understand key differences between machine lerning models
Recommended Prereqs:
- Basic understanding of the Python package Pandas
- Background in Python/R is preferred but not required
Agenda:
- Iris dataset
- What does an iris look like?
- Data hosted by the UCI Machine Learning Repository
- Machine learning and KNN (slides)
- Introduction to Linear Regression
Further Resources:
-
To go much more in-depth on linear regression, read Chapter 3 of An Introduction to Statistical Learning, watch the related videos or read a quick reference guide to the key points in that chapter.
-
This introduction to linear regression is much more detailed and mathematically thorough, and includes lots of good advice.
-
This is a relatively quick post on the assumptions of linear regression.
-
Documentation: user guide, module reference, class documentation
Homework
-
[Practice Problems for next time!] (homework/02_glass_knn.py)
-
There is also a homework question in the linear regression code file to work on!
We will discover the process and quantifiable metrics that we use to evaluate our machine learning models
At the end of the Class Students Will Be Able To:
- see how data scientists prepare and alter their models in order to maximize accuracy.
Recommended Prereqs:
- Basic understanding of Regression and Classification
Agenda:
- Model Evaluation
Further Resources:
-
Great video of ROC/AUC curves
Congratulations! You have made it this far :) Today we will be looking at the titanic data set on Kaggle.com to get a model that will tell us whether or not a person died on the Titanic.
- The competition lives here
- Find the code here
- Our data lives in two files. the in sample data and our out of sample data are separate.
- The competition lives here