BBCSLightningLab

A workshop I conducted on multithreading in python and the GIL

In this lightning lab I'll be going through multithreading in Python. Multithreading gives you the ability to run multiple tasks concurrently. What I mean by concurrently is to have multiple tasks run independently, so while task A is running, I can start task B. I don't have to wait unttil task A is finished. One way of having tasks run concurrently is to have all your tasks run in parallel, meaning simultaneously. But becase they're running simultaneously, they're using multiple CPUs, one CPU for each task. However, this is a problem in Python due to something called the global interpreter lock, or the GIL for short, which prevents tasks from running in parallel.

Why this is so is due to how Python manages memory. Every time you create a variable, an array, it gets stored in your device's memory. But you can't hold that information in memory forever, because it's taking up precious space, so you have to let go of the memory after you don't need it anymore. How Python achieves this is through something called reference counting. All objects created in Python, like arrays and variables, have a reference count that keeps track of the number of references to the object. When this count reaches 0, Python assumes you don't have a use for it anymore and the memory occupied by the object is released. Let's take a brief look at how reference counting works. We create an empty list object, make a the value of the object, set b to a, and get the reference count of the list. We can see that it's 3, because its referenced by a, by b, and by the argument passed to this function. The problem with this way of managing memory is that when two tasks are running completely in parallel, it can increase or decrease an object's reference count simultaneously, and if this happens it can incorrectly release the memory while references to that object still exist. So we want to keep this reference count safe.

Python does this by placing the GIL, a single lock on the interpreter which keeps the reference counts of objects safe. Since there's only one lock though, and the rule is that a task must get the lock to execute, it effectively makes all Python program running on the CPU single threaded, meaning only one task can run at a time. Python could use multiple locks, but that would mean a decrease in performance.

But what if we wanted to have multiple threads executing at the same time? One way of going about this is to remove the GIL, but we won't go into that today. What we can do, which is a much safer option, is to go back and forth between threads very vary rapidly, to give the illusion that the threads are running concurrently. There is a module for this in Python, it's called threading, and I'm going to show you how to use this module briefly today. I'll be importing time as well, because I'll be making use of the sleep function to demonstrate concurrency.

The first thing you want to do whenever you use a thread is to have a function, because threads can only execute a function. So we have to define a function, and then initialise our thread. We can define our own function by typing the keyword def, followed by the function name, I'll call it sleeper, and the parameters the function takes in in parentheses, in this case I'll take the number of seconds it sleeps as n. We can print something like "Hi, I'll be sleeping for n seconds" at the start, sleep for that number of seconds and then print out "I've woken up!" or something like that. So this is the function that we're going to have the thread execute.

We can initialise a thread, we'll name it t, by calling the threading module, and call the thread class within that module, so threading.Thread. When creating a thread, there are a few parameters we need to be aware of, so the first is the target. The target is the function that we want to execute, and we want to execute this function sleeper, so you want to put target = sleeper. Now the next parameter is args, short for arguments for the function. Since this function takes in an integer n, we put n in brackets, in this case I'll put 5 so it sleeps for 5 seconds. That's it.

To execute a thread, all we need to do is type the name of the thread object, in this case t, and use the start method, so t.start(). Now what we can do is to type code underneath that, and the code runs at the same time, well sort of, as the thread. To demonstrate that, we can print a few "Hi"s after t.start(), and run this, and you can see that the printing of the "Hi"s doesnt't wait until after the sleeper function has completed its execution, instead they run at the same time, sort of.

Iscaraca/BBCSLightningLab

BBCSLightningLab

A workshop I conducted on multithreading in python and the GIL