Notes, example code and datasets for the online course Jupyter Notebook for Data Science.
- Course Prerequisites – a few tutorials we recommend to get ready to take the course (in case you haven't worked with Python before)
- Course Code Examples - the source code developed during the course. We recommend you set it up on your own computer in order to try it out and make changes.
- Course Notes - useful links and additional resources that we recommend you check out after you finish each section.
- Next Steps – some tips on how to continue learning after you've finished the course
- Credits
- License
To fully benefit from the coverage included in this course, you will need:
- a basic understanding of the Python programming language (tutorial)
- know the basics of running commands on the command line (tutorial)
- a basic understanding of math and statistics will come in handy, but is not a strict requirement
Create a new directory.
mkdir jupyter-course
cd jupyter-course
Clone this repository. Note – you will first need to install the Git Large File Storage extension to clone all the large datasets in this repository.
git clone https://github.com/PacktPublishing/Jupyter-Notebook-for-Data-Science.git
Start Jupyter Notebook using the Docker stack. Adapt the path to your working directory (I'm assuming ~/code/jupyter-course).
docker run -it --rm -p 8888:8888 -v ~/code/jupyter-course:/home/jovyan/work jupyter/datascience-notebook:de0cd8011b9e
You can always leave out the exact image tag (:de0cd8011b9e
) to get the latest version of all the packages, but this is the version that was used in the course.
After everything is downloaded and started, you should get a link in your console to open Jupyter Notebook in your browser. The notebook should be connected to your local files including this git repository. You should now be ready to go through the example code or create your own notebooks to analyse the example datasets.
If you want to try out the new JupyterLab interface (as we do in the course in Section 5), you need to modify the command a bit.
docker run -it --rm -p 8888:8888 -v ~/code/jupyter-course:/home/jovyan/work jupyter/datascience-notebook:de0cd8011b9e start.sh jupyter lab
For Section 5 where we install additional packages, like Matplotlib Basemap and Plotly, build and run the custom Docker image from the Dockerfile in this git repo.
docker build --rm -t jupyter/custom-notebook .
docker run -it --rm -p 8888:8888 -v ~/code/jupyter-course:/home/jovyan jupyter/custom-notebook start.sh jupyter lab
Note – some of the notebooks connect to REST APIs that require API keys (DarkSky, Plotly & Mapbox). If you want to follow along, you will need to create accounts on these services and substitute your own API keys in the code. This is all explained in the course videos.
In the course a number of useful online resources are mentioned – you can find the links to all of them here.
1.1. Course Introduction
- Jupyter Notebook
- Quantum mechanical light-matter interaction notebook
- Whale migration notebook
- Forecasting financial data notebook
- A gallery of interesting Jupyter notebooks
1.2. Setting up Jupyter Notebook
1.3. Using Jupyter Notebook
1.4: Publishing Notebooks
2.1: Parsing the Crime Dataset
2.2: Pandas Data Structures
- Pandas documentation
- Wes McKinney – Data analysis with pandas
- Brandon Rhodes - Pandas From The Ground Up
2.3: Explore and Visualise the Data
2.4: Create an Interactive Widget
3.1: Introduction to Data Scraping
3.2: Fetching Data from a REST API Using Requests
3.3: Importing API data into Pandas
3.4: Scraping Websites using BeautifulSoup
4.1: Introduction to Information-Dense Visualisations
4.2: Visualising Data Correlation
4.3: Linear Regression
- StatsModels – a Python package with many statistical modelling methods
- Wikipedia – linear regression
4.4: Correlation Matrix
5.1: Maps in Data Science
- I Quant NY – a blog telling stories about New York through data
- How we found the worst place to park in New York City — a TEDx talk by Ben Wellington, the author of "I Quant NY"
5.2: Plotting Crime Locations
5.3: Interactive Maps Using Plotly
5.4: Final Remarks
After you're done with the course, consider finding a practical problem to work on – that's the best way to learn. Here are some ideas on what to work on:
- Awesome Python for Social Good – a curated list of topics where you can use your data science & programming skills to help society.
- Code for All – network of organizations advocating open data and helping developers and interested people get involved with analysing public data.
- Kaggle – an online community for working on real data science projects posted by companies and NGOs with occasional competitions & prizes. People also help each other out by commenting on uploaded solutions and starting various discussions about data science methods. Jupyter Notebooks are used to perform the work.
Some inspirational data science examples:
- Jake VanderPlas – the blog from an astronomer and data scientist very active in the Python community
- FiveThirtyEight – a website that publishes data-driven articles on (mostly US) sports, politics & economics. It's a great source of inspiration of what sort of topics can be explored.
- Mike Bostock – data science blog with code examples from the author of D3.js.
Course and materials author – Dražen Lučanin. Hear about more of Dražen's courses by subscribing here!
Published by Packt.
The code is published under the MIT license.