Betamore Presents Python for Data Science

With the advent of Machine Learning and Big Data, data scientists who can create large-scale data-driven solutions are in high demand. By taking this four week compressed lesson, you will be able to utilize the python programming language in order to analyze both small and large data sets while building and evaluating your very own machine learning models.

About the Instructor:

alt tag

Sinan Ozdemir

is a lecturer of Business, Mathematics and Computer Science at The Johns Hopkins University. Sinan is also a co-Founder/CTO of Tier5 and Kovani, two data science companies based in Baltimore. Sinan is an experienced teacher and entrepreneur. Follow him!

Day Part Topic
4/15 1 Data Exploration with Pandas
4/22 2 Intro to Machine Learning
4/29 3 Model Evaluation and Metrics
5/6 4 Building a Model using Titanic Survival Data

Required Pre-Reqs for ANY of the Parts:

Class 1: Introduction and Pandas

In this session our objective is to learn the basics of python and how to use python's pandas to explore data sets.

At the end of the Class Students Will Be Able To:

  • Use a module in python called Pandas in order to explore small and large data sets
  • Prepare data and prefrom necessary pre-processing steps

Recommended Prereqs:

  • Ability to write and read code
  • Background in Python/R is preferred but not required

Agenda:

  • Introductions
  • Intro to Data Science
  • Introduction to Python with Pandas

Homework

  • [Practice Problems for next time!] (homework/01_pandas_homework.py)

    Class 2: Introduction to Machine Learning

In this session we will go over the interesting subject of machine learning and build our first two models using the python package sci-kit learn.

At the end of the Class Students Will Be Able To:

  • Understand at a fundamental level what machine learning is and how it is used in practice.
  • Use a module in python called Sci-kit learn in order to build and evaluate machine learning models
  • Understand key differences between machine lerning models

Recommended Prereqs:

  • Basic understanding of the Python package Pandas
  • Background in Python/R is preferred but not required

Agenda:

Further Resources:

Homework

  • [Practice Problems for next time!] (homework/02_glass_knn.py)

  • There is also a homework question in the linear regression code file to work on!

    Class 3: Model Evaluation Metrics and Procedures

    We will discover the process and quantifiable metrics that we use to evaluate our machine learning models

At the end of the Class Students Will Be Able To:

  • see how data scientists prepare and alter their models in order to maximize accuracy.

Recommended Prereqs:

  • Basic understanding of Regression and Classification

Agenda:

  • Model Evaluation
    • Go over basic [Procedure] (slides/03_model_evaluation_procedures.pdf)
    • look at different Metrics
    • Code

Further Resources:

  • Great video of ROC/AUC curves

    Class 4: Titanic Data Set

    Congratulations! You have made it this far :) Today we will be looking at the titanic data set on Kaggle.com to get a model that will tell us whether or not a person died on the Titanic.