/pandas-workshop-v2

New version of the Pandas + Jupyter workshop

Primary LanguageJupyter Notebook

Pandas & Jupyter Workshop V2

WIFI: WeWork / P@ssw0rd

A basic introduction to data analysis using Pandas & Jupyter.

Materials by Sam Bail @spbail, based on a workshop by Alda Pontes.

Pre-requirements for the workshop We expect a working knowledge of Python in order to be able to follow along with the workshop. If you are an absolute beginner in Python and aren't familiar with Python syntax, this workshop might not be suited for you.

Binder link to run remote notebook

Binder is a web-based hub for Jupyter notebook. If your local setup does not work or if you prefer not to install anything locally, you can use the link here to work in a notebook on Binder. Please note that Binder will delete your notebook instance after 12 hours. You can download the notebook to your local machine at the end to have your own copy! Binder

Setup to run the notebook locally

Step 0: Download the materials

  • Clone this git repo to your machine and move your notebook copy you've downloaded from binder into the directory
  • Or start over with the default version of the notebook in the repo

Step 1: Make sure you are running a recent version of Python

  • I'm using a miniconda installation with Python 3.7

Step 2: Install the necessary libraries for the workshop

  • Install the necessary libraries by running pip install -r requirements.txt in the repo directory
  • Do this in a new virtual environment (e.g. a new conda environment) if necessary

Step 3: Make sure Jupyter Notebook runs

  • Open a terminal window in the directory where you downloaded the notebook and run: jupyter notebook
  • This should open a browser window, or go to http://localhost:8888/notebooks/

Step 4: Download the data file

Download the mock_treatment_starts_2016.csv file from this repo. NOTE The data is entirely made up and is in no way related to any real patient data.

About Sam

Hi, I'm Sam! I am a Senior Data Insights Engineer/Data Scientist formerly at Flatiron Health, NYC, working with electronic medical record data from oncology clinics in the US. I draw from a large toolkit ranging from various SQL flavors to Python, Pandas, Jupyter Notebook and R to statistical methods, data science and data visualization (Tableau, Superset...), as well as clinical terminologies and software engineering and automation tools - whatever gets the job done.

I completed a PhD in theoretical semantic web foundations at the School of Computer Science, The University of Manchester, UK. My thesis focused on exploring and exploiting the "justificatory structure" of OWL ontologies. While in the UK, I co-founded and lead "Manchester Girl Geeks", a volunteer-based community organization that has been running STEM workshops for girls and women in the area since 2009.

https://www.twitter.com/spbail

https://www.linkedin.com/in/spbail/