/dme-2021

Primary LanguageJupyter NotebookMIT LicenseMIT

Data Mining and Exploration

This repository contains the computer labs for the University of Edinburgh School of Informatics course Data Mining and Exploration [INFR11007].

Links

Labs general information

In this course we will be using Python 3 and the interactive notebook application Jupyter for all labs. Basic knowledge of python, numpy and working with notebooks in the Jupyter environment is assumed for this course. If you haven't used python before, you are strongly advised to familiarise yourself with basic python syntax and working in the Jupyter environment. There are many excellent tutorials available on the web and you can choose the ones you like the most. If you are not sure which ones to choose, any of these are good starting points:

Introduction to Python for scientific computing

Introduction to Jupyter notebooks

Python/Numpy tutorial

Numpy quickstart guide

Scipy Lecture notes Section 1 is enough to get you started.

Python Data Science Handbook Note that some code in the later section is not compatible with the latest scikit-learn version.

The following book is a good introduction to programming in python: A Beginners Guide to Python 3 Programming

Packages

The main packages that we will use are the following:

  • numpy: scientific computing by using array objects

  • pandas: data structures and data analysis tools

  • scikit-learn: machine learning library implementing many learning algorithms and useful tools

  • matplotlib: plotting library (similar to MATLAB's plot interface)

  • seaborn: data visualisation library which works on top of matplotlib

Getting set up

Python and the scientific libraries needed in labs are installed on the Informatics DICE system. Note that virtual DICE has not all packages installed. Please login remotely to a DICE computer instead. In case of problems, please get in touch with the Informatics computing support.

If you prefer to work on your own machine, we recommend using the Anaconda distribution. You should install Python 3 and at least the above packages.

On DICE, the following versions are installed (12 January 2021):

  • numpy: 1.17.4

  • pandas: 0.25.3

  • sci-kit learn: 0.22.2.post1

  • matplotlib: 3.1.2

  • seaborn: 0.10.0

  • scipy: 1.3.3

We recommend to use the same versions on your own machine.

You may like to install some of the jupyter extensions listed here, e.g. the support for collapsible headings, and table of contents.

History

The original 2017 version is due to Agamemnon Krasoulis, which was further refined by Maria Astefanoaei, Miruna Clinciu and Arno Onken. The latest edit is by Michael Gutmann (December 2020 to January 2021). The labs were streamlined, the code simplified and its clarity improved, and made compatible with Python 3.8.