Notes, example code and datasets for the online course Jupyter Notebook for Data Science.
- Course Prerequisites – a few tutorials we recommend to get ready to take the course (in case you haven't worked with Python before)
- Course Code Examples - the source code developed during the course. We recommend you set it up on your own computer in order to try it out and make changes.
- Course Notes - useful links and additional resources that we recommend you check out after you finish each section.
- Next Steps – some tips on how to continue learning after you've finished the course
- Credits
- License
To fully benefit from the coverage included in this course, you will need:
- a basic understanding of the Python programming language (tutorial), including some basics of the web (HTML/CSS) for the scraping section (tutorial)
- know the basics of running commands on the command line (tutorial), including knowing git well enough to download this source code locally (tutorial)
- a basic understanding of math and statistics will come in handy, but is not a strict requirement
We are combining a wide collection of skills in this course – programming, data collection and analysis, so some parts will likely be a bit unfamiliar to you, no matter your background. Don't be discouraged by this – a part of the data science profession is to learn new skills over time to stay up-to-date. If you get stuck in any particular area, take a small break, learn some more details on the subject and resume the course afterwards. There is a wealth of resources online – from the online documentation pages of the libraries we use in the course, to websites like https://stackoverflow.com/ where a lot of the beginner questions have already been answered.
Create a new directory.
mkdir jupyter-course
cd jupyter-course
Clone this repository. Note – you will first need to install the Git Large File Storage extension to clone all the large datasets in this repository.
git clone https://github.com/PacktPublishing/Jupyter-Notebook-for-Data-Science.git
Start Jupyter Notebook using the Docker stack. Adapt the path to your working directory (I'm assuming ~/code/jupyter-course).
docker run -it --rm -p 8888:8888 -v ~/code/jupyter-course:/home/jovyan/work jupyter/datascience-notebook:de0cd8011b9e
You can always leave out the exact image tag (:de0cd8011b9e
) to get the latest version of all the packages, but this is the version that was used in the course.
After everything is downloaded and started, you should get a link in your console to open Jupyter Notebook in your browser. The notebook should be connected to your local files including this git repository. You should now be ready to go through the example code or create your own notebooks to analyse the example datasets.
If you want to try out the new JupyterLab interface (as we do in the course in Section 5), you need to modify the command a bit.
docker run -it --rm -p 8888:8888 -v ~/code/jupyter-course:/home/jovyan/work jupyter/datascience-notebook:de0cd8011b9e start.sh jupyter lab
For Section 5 where we install additional packages, like Matplotlib Basemap and Plotly, build and run the custom Docker image from the Dockerfile in this git repo.
docker build --rm -t jupyter/custom-notebook .
docker run -it --rm -p 8888:8888 -v ~/code/jupyter-course:/home/jovyan jupyter/custom-notebook start.sh jupyter lab
Note – some of the notebooks connect to REST APIs that require API keys (DarkSky, Plotly & Mapbox). If you want to follow along, you will need to create accounts on these services and substitute your own API keys in the code. This is all explained in the course videos.
In the course a number of useful online resources are mentioned – you can find the links to all of them here.
1.1. Course Introduction
- Jupyter Notebook
- Quantum mechanical light-matter interaction notebook
- Whale migration notebook
- Forecasting financial data notebook
- A gallery of interesting Jupyter notebooks
1.2. Setting up Jupyter Notebook
1.3. Using Jupyter Notebook
1.4: Publishing Notebooks
2.1: Parsing the Crime Dataset
2.2: Pandas Data Structures
- Pandas documentation
- Wes McKinney – Data analysis with pandas
- Brandon Rhodes - Pandas From The Ground Up
2.3: Explore and Visualise the Data
2.4: Create an Interactive Widget
3.1: Introduction to Data Scraping
3.2: Fetching Data from a REST API Using Requests
Update – Since creating this course, DarkSky has shut down its API to the public. There are alternative weather APIs available. It is a good exercise to try to fetch similar data from another source, as these are exactly the types of tasks one frequently runs into during day-to-day data science work.
3.3: Importing API data into Pandas
3.4: Scraping Websites using BeautifulSoup
The Weather Underground website has invariably changed since creating this course. One of the downsides of scraping websites is that the underlaying HTML markup often changes (usually even more often than API protocols). Using CSS selectors similar to the ones we used in the video, it should be possible to adapt the code to work with an updated version of the website.
4.1: Introduction to Information-Dense Visualisations
4.2: Visualising Data Correlation
4.3: Linear Regression
- StatsModels – a Python package with many statistical modelling methods
- Wikipedia – linear regression
4.4: Correlation Matrix
5.1: Maps in Data Science
- I Quant NY – a blog telling stories about New York through data
- How we found the worst place to park in New York City — a TEDx talk by Ben Wellington, the author of "I Quant NY"
5.2: Plotting Crime Locations
5.3: Interactive Maps Using Plotly
5.4: Final Remarks
After you're done with the course, consider finding a practical problem to work on – that's the best way to learn. Here are some ideas on what to work on:
- Awesome Python for Social Good – a curated list of topics where you can use your data science & programming skills to help society.
- Code for All – network of organizations advocating open data and helping developers and interested people get involved with analysing public data.
- Kaggle – an online community for working on real data science projects posted by companies and NGOs with occasional competitions & prizes. People also help each other out by commenting on uploaded solutions and starting various discussions about data science methods. Jupyter Notebooks are used to perform the work.
Some inspirational data science examples:
- Jake VanderPlas – the blog from an astronomer and data scientist very active in the Python community
- FiveThirtyEight – a website that publishes data-driven articles on (mostly US) sports, politics & economics. It's a great source of inspiration of what sort of topics can be explored.
- Mike Bostock – data science blog with code examples from the author of D3.js.
Course and materials author – Dražen Lučanin. Hear about more of Dražen's courses by subscribing here!
Published by Packt.
The code is published under the MIT license.