This is a programming tutorial aimed at researchers and practitioners with (potentially) no prior programming experience, as well as with previous programming skills.
We will walk through several concepts to give you an introduction to some of the principal programming concepts like conditionals, functions, iterations, as well as more specialised topics like classes, objects and what's sometimes called defensive programming.
If all these terms sounds like gibberish to you, don't worry!
I'll try to show everything with simple code examples: no long and complicated explanations with fancy words. At the end of this tutorial, I am sure you will master all these concepts like a pro 🙌
In this tutorial we will be using Python 3. Python is nowadays considered as "the" language of choice for Data Science. There are indeed many reasons for that, and many articles have been written on the subject. This article looks like a good and clear example on the topic.
-
Q:
Yes, ok.. but.. is this a tutorial on Data Science? -
A:
No. This is a tutorial on programming with Python. The perspective though is of a wanna-be data scientists. -
Q:
Cool... but.. is this a tutorial on the Python Language ? -
A:
Ehm, No again. Sorry. We will focus on programming concepts using Python as a language. Most of the concepts you will learn are shared in most of other languages (just the syntax will be different, ed.) Although there is a section in the Lecture materials namedPython Extras
that is specifically focusing on features of the Python language. You could read it, if interested :)
I do hope that this (very simple) mind-map look-alike clarifies a bit the perspective I chose when I thought about this course.
tl,dr;
We will dive into programming focusing on two main aspects: the Algorithmic perspective, that is "what are the steps we need to implement to solve a specific problem", and the Data Structure perspective, that is "what is the data structure that would simplify as much as possible our algorithm implementation". These two perspectives led in the past decades to two completely different approaches to programming: Procedural vs Object-Oriented, respectively.
Python allows for a lot of flexibility, and this flexibility will be our swiss-knife. In fact, Python supports multiple programming paradigms at once (i.e imperative, OOP, functional [1]), and we will be (seemingly) shifting our focus on those as we go along with the lecture materials.
1
: functional programming only for the intrepid programmers of you :) See this video
The course is organised into six parts lectures, with the following learning path in mind:
-
Python Programming (part 1): Introduction to Python Main Data structures, and functions;
-
Python Programming (part 2): Advanced Data Structures and Object-Oriented Programming
-
Scientific Python Programming and Data Processing: Numerical Processing with
NumPy
& Data Processing withPandas
-
Advanced Data Objects and Data Plotting: Introduction to
dataclasses
andmatplotlib
/bokeh
for interactive plotting -
Introduction to Scikit-Learn (
sklearn
) and Machine Learning Modules -
Project-Team work on real-cases Data Science scenarios
Note: The following section is currently incomplete, and will be updated throughout the rest of the course.
This part will introduce to the concept of computer programming, and to the very basics of the Python programming language:
- The Way of the Program
- Variables, Statements and Expressions
- Introduction to Functions
- Setting up an editor
- Conditional Statements
Regardless you have already programmed before, using Python or not, I would suggest to take a look at this introductory section anyway. There is always time to skip, based on your learning pace.
Alternatively, a good starting point would be this online course: Intro to Python by Microsoft
This section contains the materials for the main topics that will be covered in our first two lectures. These are (in no specific order):
- Pythonic Functions
- Collections and Sequences
- Dictionaries
- Iterators, Generators, Comprehensions
- Classes and OOP
- Errors and Exceptions
This section contains some extra notebooks you could go through to read more about some specific aspects of the Python programming language.
Note: This is the only part of the course spefically focused on how Python does things
Option A: Clone
(or fork
) the Repository using git
(Recommended)
Git
installed in order to proceed. If you don't have git
installed on your system, you need to install git first.
Instructions to Install Git
💡 Please also consider looking at Git CheatSheet
To acquire the lecture material it is highly recommended using git
to clone the current repository. Since the repository will be constantly updated after each lesson, using git method will allow for an easier synchronisation of the material.
To clone the repository, type the following command in the terminal prompt:
git clone https://github.com/leriomaggio/python-data-science.git
git
, please make sure to run the Git Terminal (or Git Prompt)
Once completed, this will create a new folder named python-data-science
(presumably in your Home folder).
Well done! Now you should bear with me another few minutes, following instructions reported below 🙏
Please now proceed to 2. Setting up your Environment
Option B: Downloading the material in a ZIP archive from GitHub (Not Recommended)
It is indeed possible to download the whole material from GitHub as a ZIP archive. Link here
However, this method is not recommended as it will be required to download the archive everytime there is an update (which means at the end of each lesson)!
We will be using Jupyter lab as our interactive programming environment for this course.
This will have the great advantage of lowering the barriers in setting up the environment, and installing specialised tools. If you're not familiar with jupyter notebooks, no worries: we will get the time to familiarise with the environment as the first thing we will do!
Meanwhile, it is necessary to setup the Python Virtual Environment to run the code contained in this repository smoothly and with no headaches.
If you don't know what a Python virtual environment is, think of it as a sandbox Python installation you can have on your machine that is fully controllable and fully independent from any other Python environment you may have on your local machine.
To execute the notebooks in this repository, a few packages are required, but installing them in your Conda environment is super easy.
Step 1: Download Anaconda Python Distribution.
Note for Windows Users: More information here on the official documentation
Step 2: Set up the virtual environment:
Open a Terminal (or Anaconda Prompt on Windows) and move to the python-data-science
folder, i.e. the main folder of this repository.
cd python-data-science
Now create the conda environment by typing the following command:
conda env create -f pyds.yml
This will install a new Conda environment named pyds
.
Step 2.1: If you'd like to double check that the creation of the environment completed successfully, you can type:
conda info --envs
This will list all the virtual environments conda can found within your installation. pyds
should appear in the list as well.
Step 3:: Activate the environment:
Once the environment is set, we need to activate it in order to use it.
conda activate pyds
🎉 You should be now ready to go!
The last bit is to run your jupyter lab
server, and open the notebooks:
jupyter lab
The repository also includes a requirements.txt
file that can be used to install all the required packages using pip
:
pip install -r requirements.txt
However this is recommended only if (A) it is not possible to install Anaconda on your machine; (B) The setup of Anaconda environment is unsuccessfull.
Python >=3.9
Author: Valerio Maggio (@leriomaggio
), Senior Research Associate, University of Bristol.
All the Code material is distributed under the terms of the GNU GPLv3 License. See LICENSE file for additional details.
All the instructional materials in this repository is free to use, and made available under the [Creative Commons Attribution license][https://creativecommons.org/licenses/by/4.0/]. The following is a human-readable summary of (and not a substitute for) the full legal text of the CC BY 4.0 license.
You are free:
- to Share---copy and redistribute the material in any medium or format
- to Adapt---remix, transform, and build upon the material
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution---You must give appropriate credit (mentioning that your work is derived from work that is Copyright © Software Carpentry and, where practical, linking to http://software-carpentry.org/), provide a [link to the license][cc-by-human], and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions---You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
For any questions or doubts, feel free to open an issue in the repository, or drop me an email @ valerio.maggio_at_bristol.ac.uk