This repo contains a collection of Jupyter notebooks to help you learn the basics of the Python programming language together with some of the packages that are used frequently in data science. Although we provide some concrete examples of machine learning techniques, such as decision trees and regression, these exercises are intended to illustrate syntax and mechanics rather than the specific methods.
Python basics.ipynb
If you're new to Python, we recommend that you start here. This notebook covers the fundamentals of the language: basic math, Boolean and logical operations, comments, strings, lists, dictionaries, sets, tuples, branches and conditionals, loops, functions, modules, obtaining information about objects and the Python help capabilities. In fact, even if you already know Python and could use a refresher or you're new to Jupyter notebooks, this is still a good place to start.
Numpy.ipynb
Suggested starting point after Python basics. Provides a brief introduction to numpy, the widely used Python numerical math library. The emphasis is on working with the n-dimensional array (ndarray) object type, followed by two topics that are relevant to data science: linear algebra and sampling from random distributions.
Pandas.ipynb
A brief introduction to pandas, the Python data analysis toolkit. The emphasis is on working with the two primary pandas objects: Series and DataFrame.
Matplolib.ipynb
Popular package for plotting and charting. Starting with scatter plots as an example, this notebook progressively builds a publication quality graph while exploring symbol properties, plotting multiple data sets, axis labels, log and linear scales. After mastering these basics, covers other plot types such as bar charts, histograms, pie and donut charts, along with brief digressions into LaTeX formating and working with colors.
Decision trees.ipynb
Introduces the scikit-learn machine learning package, using a classic decision tree example.
Regression.ipy
A brief introduction to regression using scikit-learn. Covers basic linear regression, multiple linear regression, combining scikit-learn with pandas and working with categorical data.
strings.ipynb
Introduces a few more string handling features, including how to access useful string constants (so, for example, you don't have to keep typing 'abcdefghijklmnopqrstuvwxyz' every time you need a list of lowercase letters). Not essential, but makes for lighter topic once your brain is full.
Introduces the Dask module with a simple example and illustrates the Dask graph
Although the following resources are not specific to Python, they will be useful as you create your own notebooks. This material can be covered at any time and may serve as an intermission after working through some of the more challenging material.
Markdown.ipynb
Covers the basics of markdown language: headers, italics and emphasis, new lines and paragraphs, special characters and formating code examples. Markdown is deliberately simple and this notebook should only take a few minutes to read.
LaTeX math.ipynb
Introduces the basics of formatting math using LaTeX. Can be used both in the markdown cells of Jupyter notebooks and the figure elements (e.g. titles, axis labels, legends) of plotting packages such as Matplotlib.