Thank you for volunteering to teach this one-hour session on using the pandas
library to analyze data. This teaching guide explains our setup and the material to cover.
The class is one hour long. The exercises live in this Jupyter notebook.
It would be a good idea to take a spin through the notebook prior to teaching the session.
Imagine rolling Excel and MySQL into one tool that also allows you to track your code and share it. That's pandas
in a nutshell. There's a lot more you can do with it, of course, but this will be a good start. We'll learn how to slice and dice our data and extract basic stats. Specifically, we'll cover loading the data, filtering, sorting and grouping data.
This class is good for: People who are comfortable with Excel and are familiar with the basics of SQL and Python. We recommend that you attend the Python 101 session or have equivalent experience before coming to this class.
Attendees should leave with a basic understanding of:
- How to write and run Python code in a Jupyter notebook
- When it makes sense to script your analysis (as opposed to just using Excel, SQL, etc.)
- Loading a CSV into a
pandas
dataframe - Inspecting the dataframe with
head()
,describe()
and other methods - Sorting data with
sort_values()
- Filtering data
- Grouping data (if time)
- Where to find instructions for installing Python on their own machines
- How to find help when they get stuck
- Anything related to virtual environments
- Applying custom/lambda functions to a dataframe
I Do, We Do, You Do. Demonstrate a concept, go through it together, then give them plenty of time to experiment on their own while you and your coach walk around and answer questions (see sections marked ✍️ Try it yourself
). The pace will be slower than you think, and that's OK! It's not the end of the world if you don't get through everything.
Most people who come to this class will have zero experience with programming, so be empathetic and try to remember how frustrating it is to feel lost.
We'll have the latest version of Python 3 and pipenv to manage the virtual environment and dependencies (jupyter
and pandas
), which will already have been installed and tested prior to your session.
Begin the class by (slowly!) walking everyone through the process of activating their virtual environments and launching Jupyter:
- Open Terminal (or
cmd
orcygwin
if you're on a PC) cd
into your class directorypipenv shell
jupyter notebook
It will take everyone a few minutes to get going. You'll also probably get some questions about what, exactly, you're doing at this step. Try to avoid a lengthy digression into virtual environments -- it's beyond the scope of this hourlong session, so maybe offer to talk to them after class, or send 'em our way: training@ire.org.
Once everyone is good to go, toggle back to the terminal and show them what's going on: A Jupyter server is running in the background, so don't close that terminal window!
Go over some notebook basics: Adding cells, writing code and running cells, etc. A common beginner gotcha: Writing code that other cells depend on but forgetting to first run it to make it available.
Start marching down the notebook: Importing pandas, loading data from file, sorting, filtering, grouping. Pause frequently to ask if anyone has questions.
Any time you see ✍️ Try it yourself
, hit the brakes and give everyone time to play around with whatever concept you're discussing.
If you can, find an opportunity when someone has gotten an error and take 5 minutes to walk through basic debugging strategy: Reading the traceback error from bottom to top, strategic Googling, etc.
Unlikely! But if you have extra time, oversee some unstructured lab time -- they can practice syntax or look up additional methods, etc.
- Have everyone close out of their notebook tabs
- In terminal,
Ctrl+C
to kill the server process - Close the terminal window
You'll need the latest version of Python 3 and pipenv installed on your computer. Here's our install guide.
- Clone or download/unzip this repo onto your computer
- In your command-line interface,
cd
into the folder pipenv install
pipenv shell
jupyter notebook