/IntroPythonForDS

Teaching material for the Introduction to Python at the Data Science Retreat

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Intro to Python For Data Science

Intro in English

This big repo contains the teaching material for the Introduction to Python (and useful libraries) masterclass at the Data Science Retreat, it does not cover Pandas.

Intro in Finnish

Tämä repo sisältää Suomalaisen version....This big repo contains the teaching material for the Introduction to Python (and useful libraries) masterclass at the Data Science Retreat, it does not cover Pandas.

Table of contents

About me

Slides for this section can be found here.

The Python Programming Language

Complete slides here, inclusive of exercises.

Extra links:

Practice those examples using alternatively python files, the IPython interpreter and an IPython Notebook.

To practice:

Python 2 vs. Python 3

Note: as explained in the lesson you should now just go with Python 3. These links are from more than 2 years ago but still useful if you need to use old libraries.

A great notebook covering the main differences has been written by Sebastian Raschka.

To keep your code compatible with both Python 2 and Python 3, you might also want to use this Cheat Sheet.

Installing Python and all useful packages

Slides on this topic start here

Tools for writing Python Code (from Kristian)

Python shell

The most basic interactive Python command line, where each line starts with a >>>.

IDLE

Standard editor in Python distributions, easy to use but very basic.

IPython

A more sophisticated interactive Python command line. It incorporates tab-completion, interactive help and regular shell commands. Also look up the %-magic commands.

Spyder

Spyder is part of the Anaconda Python distribution. It is a small IDE mostly for data analysis, similar to RStudio. It automatically highlights Syntax errors, contains a variable explorer, debugging functionality and other useful things.

Jupyter Notebooks

Interactive environment for the web browser. A Jupyter notebook contains Python code, text, images and any output from your program (including plots!). It is a great tool for exploratory data analysis.

Sublime2

A general-purpose text editor that works on all systems. There are many plugins for Python available. There are a free and a commercial version available.

Visual Studio Code

The Open Source cousin of Sublime2, similar to Atom.

PyCharm

PyCharm is probably the most luxurious IDE for Python. It contains tons of functions that are a superset of all the above. PyCharm is a great choice for bigger Python projects. Free for non-commercial use.

Notepad++

If you must use a text editor on Windows to edit Python code, refuse to use anything worse than Notepad++.

Vim

I know people who are successfully using Vim to write Python code and are happy with it.

Emacs

I know people who are successfully using Emacs to write Python code, but haven't asked them how happy they are.

Running the IPython interpreter and a python file

Slides on this topic start here

Jupyter Notebook

A live demo will be given during the masterclass. Here just a warning note

Experiment further with the IPython Notebook environment with this Jupyter Notebook. Try to clone or download it, before opening it, running and modifying its cells.

Many more Jupyter features in this blog post.

And of course, be aware of the fact Jupyter is NOT an IDE and can bite you in various ways: See this presentation

Git

Slides are here

What is machine learning

A brief introduction/recap of ML its terminology. Slides here

NumPy and Matplotlib

NumPy

Start with the official NumPy Tutorial. Note: if this link returns an error, move to the PDF version.

Move on to these exercises.

Matplotlib

Learn the basics and some more advanced plotting tricks in Matplotlib with this hands-on tutorial.

It's also very useful to look at the gallery to find examples of every possible chart you may want.

Scikit-learn and your first ML case

Slides are here

Scikit-learn

SciPy

SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. Here is a hands-on overview of this collection, together with practical exercises and more advanced problems.

For those willing to go further on the statistical aspects of SciPy, I recommend having a look at these IPython Notebooks on Effect Size, Random Sampling and Hypothesis Testing.

License

This repository contains a variety of content: some developed by Amélie Anglade, some derived from or largely inspired by third-parties' work, and some entirely from third-parties.
The third-party content is distributed under the license provided by those parties. Any derivative work respects the original licenses, and credits its initial authors.

Original content developed by Amélie Anglade is distributed under the MIT license.