This repo holds the contents developed for the tutorial, Exploratory Data Analysis in Python, presented at PyCon 2017 on May 17, 2017.
We suggest setting up your environment and testing it (as detailed below) and then following along the video of the tutorial found here.
As there was limited time for instruction, we also recommend pausing throughout and practicing some of the methods discussed as you go.
We welcome any PRs with other demonstrations of how you would perform EDA on the provided datasets.
If you don't want to deal with setting up your environment or have any problems with the below instructions, you can work through the tutorial through Microsoft Azure Notebooks by creating an account and cloning the tutorial library found here (all of this is for free, forever).
Clone this repository locally on your laptop.
- Go to the green Clone or download button at the top of the repository page and copy the https link.
- From the command line run the command:
git clone https://github.com/cmawer/pycon-2017-eda-tutorial.git
We recommend using conda for managing your python environments. Specifically, we like miniconda, which is the most lightweight installation. You can install miniconda here. However, the full anaconda is good for beginners as it comes with many packages already installed.
Once installed, you can create the environment necessary for running this tutorial by running the following command from the command line in the setup/
directory of this repository:
conda update conda
then:
conda env create -f environment.yml
This command will create a new environment named eda3
.
To activate the environment you can run this command from any directory:
source activate eda3
(Mac/Linux)
activate eda3
(Windows)
If you are experienced in python and do not use conda, the requirements.txt
file is available also in the setup/
directory for pip installation. This was our environment frozen as is for a Mac. If using Windows or Linux, you may need to remove some of the version requirements.
We will be using widgets to create interactive visualizations. They will have been installed during your environment setup but you still need to run the following from the commandline:
jupyter nbextension enable --py --sys-prefix widgetsnbextension
Now that your environment is set up, let's check that it works.
- Go to the
setup/
directory from the command line and start a Jupyter notebook instance:
jupyter notebook
a lot of text should appear -- you need to leave this terminal running for your Jupyter instance to work.
-
Assuming this worked, open up the notebook titled
test-my-environment.ipynb
-
Once the notebook is open, go to the
Cell
menu and selectRun All
. -
Check that every cell in the notebook ran (i.e did not produce error as output).
test-my-environment.html
shows what the notebook should look like after running.