You have definitely heard of sets before. In this section, however, you will learn about the formal definition of sets, which will serve as a foundation for everything related to probability and combinatorics!
You will be able to:
- Define a set in the context of probability theory
- Define a universal set and subsets
- Describe the process of making unions, intersections, and complements
- Use Venn Diagrams to visually demonstrate set operations
- Describe the inclusion-exclusion principle
In probability theory, a set is defined as a well-defined collection of distinct objects. In other words, it contains 0 or more unique items, and is defined unambiguously, so that it is clear whether any given item is an element of the set.
Mathematically, you can denote a set by $S$
for the example from the previous sentence).
There are a couple primary ways we will define sets:
When discussing sets from a theoretical perspective, we will often define a set with a written definition, rather than using math notation. For example:
$S$ is defined as the set of even numbers
Another way to define a set is by listing its elements, surrounded by curly braces. For example:
$S = {1, 2, 3}$
Note that order does not matter in defining a set. "$S = {1, 2, 3}$" and "$S = {3, 2, 1}$" mean the same thing.
When working with sets in Python, we will frequently use this form of defining sets, since Python is not designed to work with infinitely-large collections. The below code snippet shows
S = {1, 2, 3}
type(S)
set
Unlike other types of collections (e.g. a sequence in math, or a list in Python), sets are chiefly concerned with membership, i.e. whether a given element belongs to the collection. Order doesn't matter, and elements can only appear once in a set.
If an element
Example: If
-
If
$x = 2$ ,$x\in S$ because$x$ is an even number. -
If
$x = 9$ ,$x\notin S$ because$x$ is not an even number.
In Python, we can use the familiar in
operator, optionally negating it with not
. Returning to the previously-defined set S
:
1 in S
True
1 not in S
False
The collection of all possible outcomes in a certain context or universe is called the universal set.
A universal set is often denoted by
Example of a universal set: All the possible outcomes when rolling a dice.
Remember that a universal set is not necessarily all the possible things that have ever existed. Typically, a universal set is just all the possible elements within certain bounds, e.g., the set of all countries in the world, the set of all the animal species in the Bronx Zoo, etc.
A universal set can have an infinite number of elements — for example, the set of all real numbers!
When there are no elements in a certain set, this set is empty, denoted by
Set
For example, say we have sets
Intuitively, you probably understand that
What might be less intuitive is that
Let's look at those examples in Python:
# Setting up example sets
A = {1, 2, 3}
B = {1, 2, 3}
C = {1, 2}
# Intuitively makes sense, C is a subset of A
C.issubset(A)
True
# Somewhat less intuitive, B is also a subset of A
# even though B and A contain the same elements
B.issubset(A)
True
We can also use <=
to check if something is a subset:
# We're asking "is C a subset of A?"
C <= A
True
# We're asking "is B a subset of A?"
B <= A
True
If you don't want to include sets that contain the exact same elements, you are looking for a proper subset.
Set
For example, if
All proper subsets are subsets, but not all subsets are proper subsets. Returning to
In Python, there is no method for finding a proper subset, but we can just use the <
operator.
# We're asking "is C a proper subset of A?"
C < A
# True, because every element of C is in A,
# and A contains at least 1 element that C does not
True
# We're asking "is A a proper subset of B?"
A < B
# False, because even though every element of A is in B,
# A and B have the exact same elements
False
The Python notation helps to make this distinction clearer. A subset (<=
) includes sets that are "equal" (have the exact same elements), whereas a proper subset (<
) only includes sets that have fewer elements. The <
and <=
operators are being used differently than you have seen previously (e.g. 4 < 5
) but there is still a conceptual relationship between them.
Another term you might see is a superset. A superset is just the inverse of a subset. For example,
Just like with a subset, a proper superset is one where the two sets are not equal. So,
In Python this can be represented using the >
and >=
operators as well as the .issuperset
method. .issuperset
corresponds with >=
so it's useful to be familiar with both forms of the notation.
# We're asking "is A a superset of C?"
A >= C
True
# We're asking "is A a proper superset of C?"
A > C
True
# We're asking "is A a superset of B?"
A >= B
True
# We're asking "is A a proper superset of B?"
A > B
False
# You can also use this method if you're only checking
# for a superset, not a proper superset
A.issuperset(C)
True
Next, let's talk about set operations. Imagine you have two sets of numbers, say the first 4 multiples of 3 in set
$ S = {3,6,9,12}$
and the first 4 multiples of 2 in set
$ T = {2,4,6,8} $.
Below, we define these sets in Python.
S = {3, 6, 9, 12}
T = {2, 4, 6, 8}
We will also define the universal set (
$ \Omega = {2,3,4,6,8,9,10,12,14,15,16,18,20}$
omega = {2, 3, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20}
The union of two sets
Applied to our example, the union of
A popular way to represent sets and their relationships is through Venn Diagrams, (https://en.wikipedia.org/wiki/Venn_diagram), see picture below!
(Note that elements of
In mathematical terms, the union of
In Python, the union of two sets is calculated using the |
operator:
S | T
{2, 3, 4, 6, 8, 9, 12}
Alternatively, the .union
method can be used:
S.union(T)
{2, 3, 4, 6, 8, 9, 12}
The intersection of two sets
Applied to our example, the intersection of
In mathematical terms, the intersection of
In Python, the intersection of two sets is calculated using the &
operator:
S & T
{6}
Alternatively, the .intersection
method can be used:
S.intersection(T)
{6}
In general, the complement of a set means the elements that are not in that set. One way to scope this is to find all of the elements of one set that are not in another set — known as the relative complement.
For example, the relative complement of
In mathematical terms, difference is denoted by $ T\backslash S $ or
In Python, the difference between two sets is calculated using the -
operator:
T - S
{2, 4, 8}
S - T
{3, 9, 12}
Alternatively, the .difference
method can be used:
T.difference(S)
{2, 4, 8}
Another way to define the complement is to find all elements that are not in the universal set — known as the absolute complement.
The absolute complement of
Returning to the previous
Mathematically, the absolute complement of
$ S' = {2,4,8,10,14,15,16,18,20} $
In Python, we can use the same -
operator as we did to find the difference above, this time using omega
:
omega - S
{2, 4, 8, 10, 14, 15, 16, 18, 20}
Note how the definition of
Just like a set can contain infinite values (e.g. the set of all even numbers), the complement of a set can contain infinite values. This will have implications for whether or not a given set can be represented in Python.
The cardinality of a set is simply the number of elements in the set.
In math notation, we represent this as
In Python, we can use the built-in len
function:
len(S)
4
Note: the pipe character (|
, vertical line) is used differently in math notation vs. Python. In math notation, two vertical lines surround the name of the set to denote cardinality. In Python, a single vertical line is an operator used to find the union between two sets (as well as for other purposes such as an OR operation between two boolean masks in pandas). Often as a data scientist you will need to translate between different kinds of notation like this — make sure you are communicating with your stakeholders to ensure you understand what is meant by a given symbol!
If you want to find the cardinality of the union of multiple sets, you can't simply add together the cardinality of each set.
Returning to the
print("(Cardinality of S) + (Cardinality of T)")
print(len(S) + len(T))
(Cardinality of S) + (Cardinality of T)
8
print("Cardinality of (S | T)")
print(len(S | T))
Cardinality of (S | T)
7
In combinational mathematics, the inclusion-exclusion principle is a counting technique that solves this problem.
For two finite sets, the method for counting the number of elements in the union is given by:
In other words, the cardinality of the union of the two sets (
-
$\mid S \mid$ (the cardinality of$S$ ) - plus
$\mid T \mid$ (the cardinality of$T$ ) - minus
$\mid S \cap T \mid$ (the cardinality of the intersection between$S$ and$T$ )
To demonstrate this in Python, recall these values:
S
{3, 6, 9, 12}
T
{2, 4, 6, 8}
S & T
{6}
Now let's compute the cardinality of
So, the cardinality of
len(S)
4
Plus the cardinality of
len(S) + len(T)
8
Minus the cardinality of
len(S) + len(T) - len(S & T)
7
Double-checking this answer:
len(S | T) == len(S) + len(T) - len(S & T)
True
Great!
This formula can also be extended to three sets, four sets, etc. For example, imagine you have a third set
This technique is an important foundation for the probability and combinatorics concepts in the upcoming lessons.
Some things to bear in mind when working with sets in Python:
- Sets are unordered collections of unique elements.
- Sets are iterable.
- Sets are collections of lower level Python objects (just like lists or dictionaries).
- Some sets that can be represented with mathematical notation cannot be represented in Python.
Documentation for sets in Python can be found here: Sets
To put this all together, let's consider an example with restaurants.
Think about a set
Next, there is a set
You could say that the universal set here, set
The union of these sets, set
The intersection of
The cardinality of
The relative complement of
The absolute complement of
In this section, you learned about sets, subsets, and universal sets. Next, you were introduced to some core set operations such as unions, intersections, and complements. After that, all this information was tied together through the inclusion-exclusion principle and a summative example. You also saw how sets translate into Python. You'll start exploring this in further detail in the next lab!