/teaching-guide-intro-to-pandas

Teaching guide for a one-hour hands-on session at an IRE/NICAR conference on using pandas to analyze data.

Primary LanguageJupyter NotebookMIT LicenseMIT

IRE/NICAR conference teaching guide for "Python: Intro to data analysis using pandas"

Thank you for volunteering to teach this one-hour session on using the pandas library to analyze data. This teaching guide explains our setup and the material to cover.

The class is one hour long. The exercises live in this Jupyter notebook.

It would be a good idea to take a spin through the notebook prior to teaching the session.

Session description

Imagine rolling Excel and MySQL into one tool that also allows you to track your code and share it. That's pandas in a nutshell. There's a lot more you can do with it, of course, but this will be a good start. We'll learn how to slice and dice our data and extract basic stats. Specifically, we'll cover loading the data, filtering, sorting and grouping data.

This class is good for: People who are comfortable with Excel and are familiar with the basics of SQL and Python. We recommend that you attend the Python 101 session or have equivalent experience before coming to this class.

Session goals

Attendees should leave with a basic understanding of:

  • How to write and run Python code in a Jupyter notebook
  • When it makes sense to script your analysis (as opposed to just using Excel, SQL, etc.)
  • Loading a CSV into a pandas dataframe
  • Inspecting the dataframe with head(), describe() and other methods
  • Sorting data with sort_values()
  • Filtering data
  • Grouping data (if time)
  • Where to find instructions for installing Python on their own machines
  • How to find help when they get stuck

Things you don't need to cover

  • Anything related to virtual environments
  • Applying custom/lambda functions to a dataframe

General approach

I Do, We Do, You Do. Demonstrate a concept, go through it together, then give them plenty of time to experiment on their own while you and your coach walk around and answer questions (see sections marked ✍️ Try it yourself). The pace will be slower than you think, and that's OK! It's not the end of the world if you don't get through everything.

Most people who come to this class will have zero experience with programming, so be empathetic and try to remember how frustrating it is to feel lost.

Class setup

We'll have the latest version of Python 3 and pipenv to manage the virtual environment and dependencies (jupyter and pandas), which will already have been installed and tested prior to your session.

Class outline

Start up the notebook server

Begin the class by (slowly!) walking everyone through the process of activating their virtual environments and launching Jupyter:

  1. Open Terminal (or cmd or cygwin if you're on a PC)
  2. cd into your class directory
  3. pipenv shell
  4. jupyter notebook

It will take everyone a few minutes to get going. You'll also probably get some questions about what, exactly, you're doing at this step. Try to avoid a lengthy digression into virtual environments -- it's beyond the scope of this hourlong session, so maybe offer to talk to them after class, or send 'em our way: training@ire.org.

Once everyone is good to go, toggle back to the terminal and show them what's going on: A Jupyter server is running in the background, so don't close that terminal window!

Go over some notebook basics: Adding cells, writing code and running cells, etc. A common beginner gotcha: Writing code that other cells depend on but forgetting to first run it to make it available.

Main course content

Start marching down the notebook: Importing pandas, loading data from file, sorting, filtering, grouping. Pause frequently to ask if anyone has questions.

Any time you see ✍️ Try it yourself, hit the brakes and give everyone time to play around with whatever concept you're discussing.

Debugging

If you can, find an opportunity when someone has gotten an error and take 5 minutes to walk through basic debugging strategy: Reading the traceback error from bottom to top, strategic Googling, etc.

If you have extra time at the end

Unlikely! But if you have extra time, oversee some unstructured lab time -- they can practice syntax or look up additional methods, etc.

Ending the session

  1. Have everyone close out of their notebook tabs
  2. In terminal, Ctrl+C to kill the server process
  3. Close the terminal window

Run the notebook

You'll need the latest version of Python 3 and pipenv installed on your computer. Here's our install guide.

  1. Clone or download/unzip this repo onto your computer
  2. In your command-line interface, cd into the folder
  3. pipenv install
  4. pipenv shell
  5. jupyter notebook