/PythonTutorials

Data Science at UCSB, Fall 2016 Programming Tutorials

Primary LanguageJupyter Notebook

Fall Quarter, 2016

Introduction to Python for Data Science

Instructor: Jason Freeberg

This quarter's tutorials will be the best yet! As you can see I've made a complete syllabus, a Jupyter notebook for each tutorial, and laid the foundation for your own personal projects. As the title says, we will learn the basics of Python for data science and data analysis. That includes OOP, the NumPy, pandas, and scikitlearn modules, feature engineering, and an introduction to machine learning.

Every class will use a Jupyter notebook to show examples and illustrate new syntax. Then I will turn it over to you guys to finish on the exercises at the end of the notebook. The labs and notebooks are designed to compliment what you will learn online from DataCamp.com. To keep things moving each week, I will assume that you have finished last week's homework and are ready for the next set of material. Now you might have some questions...

Why should I learn Python? Well...

  • It's a popular language.
  • It's very popular in the data science and analytics industries.
  • You can use it to access Spark's API.
    • Spark is a distributed computating framework built with support for SQL, streaming, and machine learning.

Are you qualified to teach this? Yes.

  • While interning at Impact Radius, I wrote PySpark code to analyse massive data sets (PySpark = Python + Spark).
  • I like to write Python code in my free time. We'll use one of my Python projects as the foundation for your own.
  • Python is not a hard language to learn... or teach!

Class requisites:

  • Basic programming knowledge or concurrently taking a programming class
  • Accounts for the following wesbites:
    • DataCamp
      • $9.00/month for student version
    • Kaggle
      • Good source of fun datasets
    • Github
      • So you can get the tutorial notebooks

Week 1: “Welcome”

  • Club information night! Ask questions, sign up, and pay dues
  • Homework:
    • Download and install Python, PyCharm, and Anaconda
      • Download the right versions for your OS
      • If you get stuck just Google "How to install ____". Get good at phrasing your questions effectively.
      • Feel free to use a different editor.
    • Clone the tutorial repository to your computer
      • Here's a guide to basic GitHub commands, I suggest against using the GUI.

Week 2: “Getting Started with Python”

  • First tutorial session
  • Get everyone set up with Python, PyCharm, Anaconda
  • Using Jupyter Notebooks in browser
  • Using pip and conda to install modules
  • Python Basics: syntax, objects, classes, modules, and data structures
  • Homework:

Week 3: “Tabular Data, Your New Best Friend”

  • Practice loading a .csv file
    • With the help of pd.read_csv( )
  • Handle missing values
    • And wrong values
  • Conditional selection
  • Homework:
    • DataCamp: Intro to Python for DS: Class 4
    • DataCamp: Intermediate Python for DS: Class 1

Week 4: “Basics of Prediction”

  • Machine learning at a high level
  • Types of machine learning
  • Recent industry trends
    • Distributed computing overtakes single supercomputers
    • Common languages
  • Examples with sci kit learn
  • Homework:
    • DataCamp: Intermediate Python for DS: Classes 2, 3

Week 5: “Feature Engineering”

Weeks 6 to 9: Begin Mini Projects

  • Introduction to the data and possible project directions
    • Script streams Tweets into a MongoDB, list, or pandas Dataframe
    • Word cloud, sentiment analysis, geo-tag visualizations, network visualizations
  • Or, find and plan your own project
    • Use meeting time to get help and input