Algorithms
Notes and homework for Coursera's "Algorithms" online class 📚
Divide and Conquer, Sorting and Searching, and Randomized Algorithms
Introduction
Why study algorithms?
foobar
Integer multiplication
input: 2 n-digit numbers x and y output: the product x.y
Primitive operation (for now): add or multiply 2 single-digit numbers.
Algo is "correct": the algo will eventually terminate and give the correct answer.
What about the amount of time needed?
upshot: nOperations <= some constant upshot: can we do better?
Karatsuba multiplication
Example: x = 5678, a = 56, b = 78 y = 1234, c = 12, d = 34
step1: a.c = 672 step2: d.b = 2652 step3: (a+b)(c+d) = 6164 step4: (3) - (2) - (1) = 2840 step5: (1) * 10^4 + (2) + (4) * 10^2 = 7006652, the output
Recursive algorithm: write x = 10^(n/2) * a + b, y = 10^(n/2) * c + d where a, b, c, d are n/2-digit numbers.
x.y = ... = 10^n * ac + 10^(n/2) * (ad + bc) + bd (¤)
idea: recursively compute ac, ad, bc, bd then compute ¤
step1: recursively compute ac step2: recursively compute bd step3: recursively compute (a + b)(c + d) = ac + ad + bc + bd Gauss's trick: (3) - (1) - (2) = ad + bc
upshot: only need 3 recursive multiplications and some additions
About the course
spoilers
Merge Sort: Motivation and Example
Improves over Selection, Insertion, Bubble sorts Good example for worst case behavior and asymptotic analysis
Recursion tree methods --> "Master method"
input: array of n numbers output: array of n numbers, sorted from min to max
Assumption: numbers are distincts (does not change much)
Merge Sort: Pseudocode / Merge Sort: Analysis
ignore base cases
C = output[length = n]
A = first sorted array [n/2]
B = second sorted array [n/2]
i = 1
j = 1
for k = 1 to n
if A(i) < B(j)
C(k) = A(i)
i++
else
C(k) = B(j)
j++
end
Running time ? n numbers to sort
Upshot: <= 4n + 2 <= 6n (since n >= 1)
Claim: merge sort requires <= 6n * (log(n) + 1) operations to sort n
At each level j=0,1,...,log(n) there are 2^j subproblems each of size n/(2^j)
Guiding Principles for Analysis of Algorithms
- worst case analysis
- don't pay attention to constant factors and lower order terms
- asymptotic analysis (focus on large input sizes)
Fast algorithm ~= worst-case running time grows slowly with input size
Holy grail: linear running time
Asymptotic analysis
The gist
Importance: vocabulary, "sweet spot", sharp enough to differentiate algo
High-level idea: suppress constant factors and lower-order terms.
constant factors: too system-dependant lower-order terms: irrelevant for large input
--> 6n(log(n) + 1) ==> nlog(n)
examples:
- One loop: does an array contains a given integer --> O(n)
- Two loops: does A or B contain t? --> O(n)
- Two nested loops: do A and B have a number in common --> O(n²)
- Two nested loops (II): does A have duplicate entries --> O(n²)
Big-Oh Notation
T(n) = function on N
T(n) = O(f(n)) <--> E c, n0 > 0 / T(n) <= c * f(n) A n >= n0
Basic Examples
Example 1
Claim: if T(n) = a_k * n^k + ... + a_0 * n^0 then T(n) = O(n^k)
Proof: choose n0 = 1 and c = |a_k|+...+|a_0|
Examples 2
Claim: A k >= 1, n^k != O(n^(k-1))
Proof: by contradiction. Suppose n^k = O(n^(k-1))
E c, n0 > 0 / n^k <= c * n^(k -1), A n >= 0
ie n <= c, A n >= 0
Big Omega and Theta
Omega notation:
T(n) = Om(f(n)) <--> E c, n0 / T(n) >= c * f(n) A n >= n0
Theta notation: O(f(n)) AND Om(f(n))
T(n) = th(f(n)) <--> E c1, c2, n0 / c1 * f(n) <= T(n) <= c2 * f(n) A n >= n0
Little Oh notation:
T(n) = o(f(n)) <--> A c > 0, E n0 / T(n) <= c * f(n) A n >= n0
Divide and conquer algorithms
O(n log n) Algorithm for Counting Inversions I
The divide and conquer paradigm:
- Divide the problem into smaller subproblems
- Conquer the subproblems via recursive calls
- Combine solutions of subproblems into the original solution
Input: array A containing the numbers 1, 2, 3, ... in some arbitrary order Output: number of inversions = number of pairs (i,j) of indices with i < j and A[i] > A[j] Motivation: numerical similarity measure between two ranked lists Largest possible number of inversions: (n C 2) = n(n - 1) / 2
Note: (n choose k) = n! / (k!(n - k)!)
Brute force algorithm: O(n²) time
3 types of inversions, (i,j) <= n / 2; (i,j) > n / 2; i <= n / 2 < j (split inversion)
High-level algorithm:
CountInversions(array A of length n)
if (n == 1) return 0
else
x = CountInversions(first half of A)
y = CountInversions(second half of A)
z = CountSplitInversions(A, n) // TODO
return x + y + z
O(n log n) Algorithm for Counting Inversions II
Idea: piggyback on merge sort
SortAndCountInversions(array A of length n)
if (n == 1) return (A, 0)
else
(B, x) = SortAndCountInversions(first half of A)
(C, y) = SortAndCountInversions(second half of A)
(D, z) = MergeAndCountSlipInversions(B, C, n) // TODO
return (D, x + y + z)
MergeAndCountSlipInversions: when an element of 2nd array C gets copied into output D, increment total by number of elements remaining in 1st array B
Run time: O(nlog(n))
Strassen's Subcubic Matrix Multiplication Algorithm
n by b matrices
Classic multiplication: O(n^3)
Recursive algorithm n#1:
Idea: write X = (A, B; C, D) and Y = (E, F; G, H) where A...H are n/2 by n/2 matrices
Then: XY = (AE + BG, AF + BH; CE + DG, CF + DH)
--> O(n^3)
Strassen's algorithm (1969)
- Step 1: recursiely compute only 7 (cleverly chosen) products
- Step 2: do the necessary (clever) additions and subtractions (still O(n²) time)
- Fact: better than cubic time! (see next lecture)
7 products:
- P1 = A(F - H)
- P2 = (A + B)H
- P3 = (C + D)E
- P4 = D(G - E)
- P5 = (A + D)(E + H)
- P6 = (B - D)(G + H)
- P7 = (A - C)(E + F)
Claim: XY = (P5 + P4 - P2 + P6, P1 + P2; P3 + P4, P1 + P5 - P3 - P7)
O(n log n) Algorithm for Closest Pair
input: n points (€ R²) in the plane output: the closest pair in the plane (euclidian distance) assumption: all points have distinct x and y coordinates
Brute force: O(n²)
1-D version of closest pair:
- sort points O(nlogn)
- return closest pair of adjacent points O(n)
Goal: O(nlogn) for 2-D version
High level approach:
- make copies of points sorted by x-coordinate (Px) and y-coordinates (Py) O(nlogn)
- use divide-conquer
ClosestPair(Px, Py)
# base case omitted
let Q = leftmost half part of P
R = rightmost
form Qx, Qy, Rx, Ry (takes O(n) time)
(p1, q1) = ClosestPair(Qx, Qy)
(p2, q2) = ClosestPair(Rx, Ry)
Either the closest pair is entirely in Q, either in R
unlucky case: the pair is splitted in Q and R
(p3, q3) = ClosestSplitPair(Px, Py)
return best of 3 pairs
If ClosestSplitPair is O(n) then ClosestPair is O(nlogn)
ClosestSplitPair: key idea: only need to compute the unlucky case. Because the recursions will eventually make all cases unlucky
ClosestPair(Px, Py)
let Q = leftmost half part of P
R = rightmost
form Qx, Qy, Rx, Ry (takes O(n) time)
(p1, q1) = ClosestPair(Qx, Qy)
(p2, q2) = ClosestPair(Rx, Ry)
let D = min(d(p1, q2), d(p2, q2))
(p3, q3) = ClosestSplitPair(Px, Py, D)
return best of 3 pairs
ClosestSplitPair(Px, Py, D)
let x = biggest x-coordinate in left of P (O(n))
let Sy = points of P with x-coordinates in [x - D, x + D], sorted by y-coordinates (O(n))
let best = D
let best_pair = null
for i = 1 to len(Sy) - 1
for j = 1 to min(7, len(Sy) - i)
let p, q = i-th, (i + j)-th points of Sy
if d(p, d) < best
best = d(p, q)
best_pair = (p, q)
# O(1) time
# O(n) time
return best_pair
Satisfies the correctness requirements u_u Claim: let p € Q, q € R be a split pair with d(p, q) < D Then: (A) p and q are members of Sy (B) p and q are at most 7 positions apart in Sy
The master method (aka master theorem)
Motivation
Re: Karatsuba
T(n) = maximum number of operations this algorithm needs to multiply two n-digit numbers
Recurrence: express T(n) in term of running time of recursive calls
- Base case: T(1) <= a constant
- A n > 1: T(n) <= 3T(n / 2) + O(n)
Formal Statement
assumption: all subproblems have equal size (previously: n / 2)
- A n > n_base_case: T(n) <= a * T(n / b) + O(n^d)
- a: number of recursive calls (>= 1)
- b: input size shrinkage factor (> 1)
- d: exponent in running time of "comble step" (>= 0)
- (a, b, d) independent of n
if T(n) <= a * T(n / b) + O(n^d)
then T(n) = {
if a = b^d: O(n^d * log(n))
if a < b^d: O(n^d)
if a > b^d: O(n^log_b(a))
Only gives upper bond, but can use theta
log: no base ? base does not matter (constant factor change) UNLESS in exponent, then the constant factor matters, a lot.
Examples
Merge sort: a = 2 b = 2 d = 1 --> a = b^d --> O(nlogn)
Binary search: a = 1 b = 2 d = 0 --> a = b^d --> O(logn)
Karatsuba without Gauss's trick: a = 4 b = 2 d = 1 --> a > b^d --> O(n^(log2(4))) = O(n²)
Karatsuba: a = 3 b = 2 d = 1 --> a > b^d --> O(n^(log2(3))) = O(n^1.59)
Strassen's matrix multiplication: a = 7 b = 2 d = 2 --> a > b^d --> O(n^(log2(7))) = O(n^2.81)
Proof
Assume: recurrence is (i) T(1) < c (ii) T(n) < a * T(n / b) + c * n^d and n is a power of b
Idea: generalize merge sort analysis ie use a recursion tree
at each level j=0,1,2,...,log_b(n) there are a^j subproblems each of size n / b^j
The recursion tree
Level 0: n Level 1: n/b, n/b, ... (a branches) ... Level log_b(n): base cases
Work at a single level (ignoring work in recursive calls): <= a^j * c * (n / b^j)^d where:
- a^j is the # of level-j subproblems
- n / b^j is the size of each level-j subproblem
- c * (n / b^j)^d is the work per level-j subproblem
Total work: summing over all levels j: total work <= c * n^d * sum(j=0...log_b(n), (a / b^d)^j) (¤)
How to think about (¤)
-
a = rate of subproblem proliferation (RSP)
-
b^d = rate of work shrinkage (RWS) (per subproblem)
-
If RSP < RWP then the amount of work is decreasing with the recursion level j (expect O(n^d * logn))
-
If RSP > RWP then the amount of work is increasing with the recursion level j (extect O(n^d))
-
If RSP = RWP then the amount of work is the same at every recursion level j (expect O(# leaves))
If a = b^d then (¤) = c * n^d * (log_b(n) + 1) = O(n^d * log(n)) We can suppress constant factors
A r != 1, 1 + r + r² + ... + r^k = (r^(k + 1) - 1) / (r - 1) upshot:
- If r < 1, <= 1 / (r - 1) = a constant (independent of k)
- If r > 1, <= r^k * (1 + 1 / (r - 1))
Quicksort algorithm
Quicksort: Overview
- Greatest hit algorithm
- Prevalent in practice
- O(nlogn), works in place
input: array of n unsorted numbers output: array of n sorted numbers assume: all array entries distinct
key idea: partition array arounda pivot element
- pick element of array
- rearrange array so that left to pivot === less than pivot, right === greater
Note: puts pivot in its "rightful position"
Partition:
- linear O(n) time, no extra memory
- reduces problem size
High-level description:
QuickSort(A of length n)
if n == 1 return
p = ChoosePivot(A)
partition A around p
recursively sort first and second part (not including p)
Partitioning Around a Pivot
The easy way: using O(n) extra memory. bouh!
In-place implementation: Assume: pivot is the first element of array High-level idea:
- single scan through array
- invariant: everything looked at so far is partitioned
Partition(A, l, r) // input = A[l...r]
p = A[l]
i = l + 1
for j = l + 1 to r
if A[j] < p // if not, do nothing
swap A[j] and A[i]
i++
swap A[l] and A[i - 1]
Running time: O(n) where n = r - l + 1 (O(1) work per array entry)
Correctness: the for loop maintains the invariants (A[l + 1]...A[i - 1] < p and A[i]...A[j - 1] > p)
Correctness of Quicksort
By induction. Think about the partitioned array. The "parts" are smaller than n, hence the proof.
Choosing a Good Pivot
The running time of QuickSort depends on the quality of the pivot
If pivot is always the first element, on a sorted array QuickSort performs theta(n²) Because n + (n - 1) + ... + 1 < n²
If pivot is always the median element of te array, on any array QuickSort performs theta(nlogn)
How to choose pivots? Key idea: random pivots!
QuickSort theorem: for every input array of length n, the average running time of QuickSort with random pivots is O(nlogn)
QuickSort analysis
Analysis: A Decomposition Principle
Fixed input array A of length n
sample space Om = all possible outcomes of random choices in QuickSort (i.e. pivot sequences)
random variable A s € Om, E C(s) the # of comparisons between two input elements
lemma: running time of QuickSort dominated by companrisons
notation: z_i = i-th smallest item in A
X_ij(s) = # of times z_i and z_j get compared in pivot sequence s (indicator random variable: ie can take only 0 or 1)
A s, C(s) = sum(i=1...n - 1, sum(j=i + 1...n, X_ij(s)))
By linearity of ... this is cumbersome to type
A general decomposition principle:
- Identify random variable Y that you really care about
- Express Y as a sum of indicator random variable: Y = sum(X)
- Apply linearity of expectation: E[Y] = sum(E[x]) = sum(P(x = 1))
Linear-time Selection
Randomized Selection - Algorithm
input: array A of n disctinct numbers and a number i output: i-th order statistic ie the i-th smallest element of A example: median
The 1st order statistic is the min --> trivial with linear scan The nth order statistic is the max --> idem
Easy to do with sorting, but nlogn
RandomizedSelection(A of length n, order statistic i)
if n == 1 return A[0]
choose pivot p uniformly at random
partition A around p
let j = new index of p
if i == j return p // lucky
if j > i return RandomizedSelection(first part of A of length j - 1, i)
if j < i return RandomizedSelection(second part of A of length n - j, i - j)
Claim: RandomizedSelection is correct (proof: by induction)
Running time ? Depends on quality of the pivot worst pivot: theta(n²) key: find a pivot giving a balanced split best pivot: the median (lol) --> would get recurrence T(n) <= T(n / 2) + O(n) --master method--> T(n) = O(n) Hope: random pivot is "pretty good" "often enough"
RandomizedSelection theorem: for every input array of length n, the average running time is O(n)
Randomized Selection - Analysis
Notation: RS is in phase j if the current array size is between (3/4)^(j+1) * n and (3/4)^j * n X_j = number of recursion calls during phase j
...
coin flip: E[heads] = 1 + 0.5 * E[heads]
- 1 is the flip
- 0.5 is the probability that the flip fails, in which case we would try again
Deterministic Selection - Algorithm
key idea: use the "median of medians" as the pivot
ChoosePivot(A of length n)
break A into n / 5 groups of size 5 each
sort each group
copy the n / 5 medians (ie middle elements of each sorted group) into new array C
recursively compute median of C
return it as the pivot
partition around it
i.e.:
DSelect(A of length n, order statistic i)
Break A into n / 5 groups of size 5, sort the groups
C = the n / 5 "middle elements"
p = DSelect( of length n / 5, n / 10)
Partition A around P
if j = i return p
if j < i return DSelect(first part of A of length j - 1, i)
if j > i return DSelect(second part of A of length n - j, i - j)
Not as good as RSelect in practice
- worse constants
- not in-place
Deterministic Selection - Analysis
Omega(n log n) Lower Bound for Comparison-Based Sorting
Graphs and the contraction algorithm
Graphs and Minimum Cuts
"Only" 20 yo algorithm
Cuts of a graph (V, E) a partition of V into two non-empty sets A and B
A graph with n vertices has 2^n possible cuts
Input: an undirected graph G = (V, E) Goal: compute a cut with fewest number of crossing edges (i.e. a min-cut)
Graph Representations
Let an undirected graph of n vertices Min # of edges: n - 1 Max # of edges: n(n - 1) / 2
n = # of vertices, m = # of edges In most applications, m is Omega(n) and O(n²)
- Sparce graph: m is O(n) or close to it
- Dense graph: m is closer to O(n²)
Adjacency matrix: represent G using a n by n matrix
A_ij = 1 <=> G has an i-j edge (requires n² space)
Adjacency list:
- array or list of vertices (theta(n))
- array or list of edges (theta(m))
- each edge points to its endpoints (theta(m))
- each vertex points to edges incident to it (theta(m)) (same pointers as the 3rd item) --> requires theta(n + m) space (or theta(max(n, m)))
Which is better: it depends on graph density and operations needed.
Random Contraction Algorithm
while there are more than 2 vertices left:
- pick a remaining edge (u, v) at random
- merge (or "contract") u and v into a single vertex
- remove self loops
return cut represented by final 2 vertices
Analysis of Contraction Algorithm
What is the probability of success?
Fix G = (V, E), n and m Fix A and B the minimum cut Let k be the # of edges crossing (A, B), those edge form F
if an edge of F is contrated, the algo fails if not, succeeds
First iteration, probability to fail: k/m
Key observation: The degree of each vertex is at least k
We have sum(v, degree(v)) = 2m and sum(v, degree(v)) >= kn --> m >= kn / 2
P(fail1) = k/m <= 2 / n
Second iteration: P(success1, success2) = P(success2|succes1) * P(success1)
P(success1) >= 1 - 2/n P(success2|success1) = 1 - k/remaining edges
remaining edges >= m - 1 >= k(n - 1) / 2
Total probability: >= (1 - 2/n)(1 - 2/(n - 1))(1 - 2/(n - 2))...(1 - 2/(n - (n - 3))) i.e. >= 2 / (n(n - 1 )) >= 2 / n²
Problem: low success proba... but non trivial! (# of cuts is exp, proba is ²)
Solution: run the algo a large number N of times and remember the smallest cut found
How many trials needed?
P(all trials fail) <= (1 - 1 / n²)^N
We have 1 + x <= e^x
If we take N = n², P(all trials fail) <= (e^(-1/n²))^n² = 1 / e
If we take N = n²ln(n), P(all trials fail) <= 1 / n
Running time ? polynomial in n and m but slow Omega(n²m)
But: better algo exist
Counting Minimum Cuts
A graph may ave more than one min cut
Eg: a tree has n - 1 min cut
Question: what is the largest number of min cut a graph can have? Answer: n choose 2 = n(n - 1) / 2
Graph Search, Shortest Paths, and Data Structures
Graph search and connectivity
Graph Search - Overview
Generic graph search Goals:
- Find everything findable from a given vertex
- Don't explore anything twice
- in O(n + m) time
Generic algo: given a grap G and a vertex s
- initially s explored, all other vertices unexplored
- while possible:
- choose an edge (u, v) with u explored and v unexplored
- mark v explored
Claim: at the end of the algo, v explored <=> G has a path from s to v
Breath-First Search (BFS)
- explore nodes in layer
- can compute shortest paths
- can compute connected components of an undirected graph --> O(n + m) time using a queue (FIFO)
Depth-First Search (DFS)
- explore agressively like a maze, backtrack only when necessary
- compute topological ordering of a DAG
- compute connected components in directed graphs
Breadth-First Search (BFS): The Basics
BFS(graph G, start vertex s)
mark s as explored
let Q = queue FIFO initialized with s
while Q != 0:
v = Q.shift() // The first
for each edge (v, w):
if w unexplored:
mark w as explored
Q.append(w) // At the end
Running time: O(ns + ms), where ns is the # of nodes reachable from s
BFS and Shortest Paths
Goal: compute dist(v), te fewest # of edges between s and v
extra code:
- initialize dist(v) = { 0 is s == v, +infinity otherwise }
- when considering edge (v, w):
- if w inexplored: dist(w) = dist(v) + 1
BFS and Undirected Connectivity
Connected components: "the pieces" of G
Formal definition: equivalence classes of the relation u~v <=> E u-v path in G
Goal: compute all connected components Why: check if network is disconnected
ComputeAllComponents(G)
mark all nodes as unexplored
for i=1 to n
if i not explored
BFS(G, i)
# note: maybe some code to describe the found components
Running time: O(n + m)
Depth-First Search (DFS): The Basics
DFS(G, starting vertex s)
mark s as explored
for every edge (s, v):
if v is unexplored:
DFS(G, v)
Could also mimick BFS with a stack instead of a queue.
Running time: O(ns + ms)
Topological Sort
Definition: a topological ordering of a directed graph G is a labelling F of G's nodes such that:
- the f(v)'s are the set {1,2,...,n}
- (u,v) € G => f(u) < f(v)
Motivation: Sequence tasks while respecting precedence constraints
Note: cycle => no topo ordering
Note: every DAG has a sink vertex.
Straightforward solution:
- let v be a sink vertex of G
- set f(v) = n
- recurse on G - {v}
Topological sort via DFS (Slick)
TopologicalSort(G)
mark all vertices unexplored
current_label = n
for each vertex v:
if v not yet explored:
DFSForTopologicalSort(G, v)
DFSForTopologicalSort(G, s)
mark s as explored
for every edge (s, v):
if v not yet explored:
DFS(G, v)
set f(s) = current_label
current_label--
Running time: O(m + n) Reason: O(1) time per node, O(1) time per edge Correctness: need to show that if (u,v) is an edge, then f(u)<f(v)
Computing Strong Components: The Algorithm
Definition: the Strongly Connected Components (SCC) of a directed graph are the equivalence classes of the relation: u~v <=> E path u->v and a path v->u in G
Kosaraju's two-pass algorithm O(m+n)
KosarajuSCC(G)
let Grev = G with all arcs reversed (could also run DFS going backward)
run DFS-loop on Grev (goal: compute "magical ordering" of nodes)
let f(v) = "finising time" of each vertex v
run DFS-loop on G (goal: discover the SCC one by one)
processing nodes in decreasing order of finising time
save the leaders
global variable t = 0 (# of nodes processed so far)
global variable s = Null (current source vertex)
DFS-loop(G)
Assume nodes labeled 1 to n
for i=n to 1:
if i not yet explored:
s = i
DFS(G, i)
DFS(G, i)
mark i as explored (in a given DFS-loop)
set leader(i) = node s
for each edge (i,j):
if j not yet explored:
DFS(G, j)
t++
set f(i) = t
Computing Strong Components: The Analysis
2nd pass of DFS-loop begings somewhere in a sink SCC C* First pass of DFS-loop discovers C* and nothing else
Dijkstra's shortest-path algorithm
Dijkstra's Shortest-Path Algorithm
Input: a (directed) Graph G = (V, E) (n, m) each edge has a nonnegative length "le" source vertex s
Output: for each v€V, compute L(v) = length of the shortest s-v path in G
Assumption:
- for convinience: A v€V, E s->v path
- important: no negative edge length: le >= 0 A e € E
BFS computes the shortest path iff le=1 A e € E
DijkstraShortestPath(G, s)
X = {s} # Vertices processed so far
A[s] = 0 # Computed shortest path distances
B[s] = empty path # Computed shortest paths (do not include in actual implementation)
while X != V:
among all edges (v, w) € E with v € X, w not € X, pick the one that minimizes A[v] + l_vw (Dijkstra's greedy criterion), call it (v*, w*)
add w* to X
set A[w*] = A[v*] + l_v*w*
set B[w*] = B[v*] + (v*, w*)
Dijkstra's Algorithm: Implementation and Running Time
As such, running time is theta(nm). But we can do better.
Heap: perform insert, extract min in O(logn) time
- perfectly balanced binary tree (height: log_2(n))
- extract min by swapping up last leaf, bubbling down
- insert bia bubling up Also: will need the ability to delete from the middle of the heap (bubble up or down, as needed)
Two invariants:
- elements in heap = vertices of V - X
- for v € V - X, key[v] = smallest Dijkstra greedy score of an edge (u, v) € E with U € X (key means value somehow) or +infinity if no such edge
With those invariants, extract-min yields correct vertex w* to add to X next
Maintaining the 2nd invariant: When w is extracted from heap, (ie added to X)
for each edge (w, v) € E:
if v € V - X (ie in heap):
remove v from heap
recompute key[v] = min(key[v], A[w] + l_wv)
reinsert v into heap
Running time analysis:
- Dominated by heap operations, O(logn) each
- (n - 1) extract mins
- each edge (v, w) triggers at most one Delete/Insert combo (if v added to X first) --> # of heap operations is O(n + m) = O(m) since the graph is weakly connected --> Running time: O(mlogn)
Heaps
Data Structures: Overview
Point: organize data so that it can be accessed quickly and usefully
Ex: lists, stacks, queues, heaps, search trees, hashtables, bloom filters, union-find, etc...
Heaps: Operations and Applications
- insert O(logn)
- extract min O(logn)
- heapify O(n)
- delete O(logn)
Application: sorting: HeapSort
Application: Median maintenance
Input: a sequence of n numbers Output: for each step, the median of the numbers Constraint: O(log i)
Solution: 1 extractMin (Hhigh) and 1 extractMax (Hlow) heap Maintain invariant: i/2 smallest (largest) elements in Hlow (Hhigh)
Balanced binary search trees
Balanced Search Trees: Operations and Applications
Things to do with a sorted array:
- Search (binary search) O(logn)
- Select (a given order statistic, ex: min/max) O(1)
- Predecessor / successor O(1)
- Rank (know # of keys less than a given value) O(logn)
- Output in sorted order O(n)
--> Apply on a static dataset
BST: like sorted array + fast log insertion and deletion
- Search O(logn)
- Select O(logn)
- Min/Max O(logn)
- Predecessor / successor O(logn)
- Rank O(logn)
- Output O(n)
- Insert / Delete O(logn)
Binary Search Tree Basics
Exactly one node per key most basic version each child has left and right child + parent pointers
Search tree property: left children < current key, right children > current key
Height of a BST: Number or "levels" (arcs) Note: many possible search tree for a set of keys --> height or depth: log2n <= h <= n
Search:
- Start at the root
- Traverse left/right child pointrs as needed
Insert:
- Search for k (unseccessfully)
- Rewire final Null pointer to point to new node with key k
--> theta(height)
Min/Max:
- Start at the root
- Follow left/right child pointers, return last key found
Predecessor of k:
- Easy case: if k's left subtree nonempty, return max key in left
- Otherwise: follow parent pointers until you get to a key less than k
In-order traversal
- Let r = root of search tree, with subtrees Tl and Tr
- Recurse on Tl
- print out r's key
- Recurse on Tr
Deletion (of a key k)
- Search for key
- easy case (k has no children): just delete k
- medium case (k has only one child): just "splice out" k, replace with child
- difficult case (k has 2 children):
- compute k's predecessor l
- swap k and l (note: k does not have a right child anymore)
--> theta(height)
Select and rank: Idea: store a little bit of extra info at each tree node about the tree itself Example augmentation: size(x) = # of tree nodes in subtree rooted at x = size(y) + size(z) + 1 (children + 1)
How to select ith order statistic (with such a tree)
- start at root x, with children y and z
- let a = size(y) (0 if no left child)
- if a = i - 1 return x's key
- if a >= i recursively compute ith order statistic of search tree rooted at y
- if a < i - i recursively compute (i - a - 1)th order statistic of tree rooted at z
--> theta(height)
Red-Black Trees
Balanced, logn garanteed
Idea: ensure that height stays logarithmic --> Ops run at O(logn)
Example: red-black trees (see also AVL trees, splaytrees, b-trees, b+-trees)
Red-black invariants:
- Same as BST, plus more
- Each node red or black
- Root is black
- No 2 reds in a row (red node ==> children must be black)
- Every root-Null path (unsuccessful search) path has same number of black nodes
Example: a chain of length 3 cannot be a red-black tree. (4th invariant violated)
Example: perfectly balanced search tree
Height garantee: every red-black tree with n nodes has height O(logn) [h <= 2log2(n+1)]
Rotations
Left rotations: locally rebalance subtrees at a node in O(1) time
Of a parent x and right child y (x < y).
P - x - A (left st of x), (y - B (left st of y), C (right sb of y)) --> P - y - (x - A, B), C
Hashing; bloom filters
Hash Tables: Operations and Applications
purpose: maintain a possibly evolving set of stuff. (gné?)
Supported operation: insert, delete, lookup (using a "key") --> O(1)
Hash Tables: Implementation Details
Setup: know the universe U that we ant to store (all IP address, all chessboard position, ...)
Goal: want to maintain a set S € U
Naive solutions:
- array-based, indexed by u, O(1) time but theta(|U|) space
- list-based, theta(|S|) space but theta(|S|) memory
Solution:
- pick n = # of "buckets" with n ~= |S| (assume |S| does not vary too much)
- choose a hash function h: U --> {0, 1, 2, ..., n - 1}
- Use array A of length n, store x in A[h(x)]
Birthday "paradox" => colisions will happen
Resolving colisions:
solution #1: separate chaining
- keep linked list in each bucket
- given a key/object x, perform insert/delete/lookup in the list in A[h(x)]
solution #2: open addressing (only o object per bucket)
- hash function now specifies probe sequence h1(x), h2(x), ...
- keep trying until find an open slot
What makes a good hash function ?
Note: in hash table with chaining, insert is O(1), O(list length) for insert/delete Point: performance depends on a good hash function
Properties of a good has function:
- should "spread the data" as much as possible --> performance
- does not work too hard, fast to evaluate
Bad hash function, keys = phone numbers (10 digits) |u| = 10^10, n = 10^3
- first 3 digits (terrible, some buckets will be filled, other empty)
- last 3 digits (not uniformly distributed)
next example: keys = memory locations (will be multiples of a number of 2)
- bad hash function: x mod 1000 (some buckets will be empty (odd numbers))
Quick and dirty hash functions:
- turn the object into a (preferably big) number (eg: subroutine to convert strings into intergers)
- compress that number (eg: the mod function)
How to choose the number of buckets:
- choose n to be a prime number (no multiple in common with data)
- not too close to a power of 2 or 10
Pathological Data Sets and Universal Hashing Motivation
The load of a hash table: alpha = # of things that have been inserted / # of buckets of hash table
For good hash table performance:
- alpha = O(1) is necessary condition for operations to run in constant time
- ith open addressing, need alpha << 1
--> Need to control the load, make sure the # of buket increases with the # of items in the HT
Pathological data sets
--> Need a good hash function (spreads the data evenly) --> super good hash function does not exist
Hacking prevention:
- use a cryptografic hash function (ex: SHA-2), infeasible to reverse engineer
- use randomization: design a family of hash functions such as for any data set, on average, the hash function will perform well
Universal Hashing: Definition and Example
What is a good random hash function ?
Def: let H be a set of hash functions U --> {0, ..., n - 1}
H is universal <=> A (x, y) € U², x != y, P(h(x) = h(y), h € H) <= 1 / n where h is chosen uniformly at random
Example: IP address (form: (x1, x2, x3, x4) with x € {0, ..., 255})
Let n = a prime
Construction:
Bloom filters
Bloom Filters: the basics
Raison d'être lookups
Comparaison with HT:
- Pro: more space efficient
- Con: can't store objects, no deletions (with basic implementation), small fasle negative probability
Applications:
- Early spellchecker (is this a legitimate word)
- Canonical: list of forbiden passwords
- Modern: network routers
Under the hood:
- Array of n bits (0 or 1)
- k hash functions h1, ..., hk (k small constant)
Insert: for i = 1...k, set A[hi(x)] = 1 Lookup: return true if A[hi(x)] = 1 A i = 1...k
Note: no false negative, but posibly false positive
Heuristic Analysis
Intuition: sould be a trade-off between space and error probability
Proba that a bit (in BF) has been set to 1 = 1 - (1 - 1/n)^(k * |S|)
ie P <~= 1 - e^(-k|S|/n) = 1 - e^(-k/b) where b is # bits per object
So under assumptions, for x !€ S, false positive probalbility is <= (1 - e^-k/b)^k
How to set k ? k ~= ln2 * b
Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming
Greedy Algorithms
Introduction to Greedy Algorithms
Definition: iteratively make "myopic" decisions, hope everything works at the end
Example: Dijkstra's shortest path
Application: Optimal Caching
The caching problem:
- small fast memory (the cache)
- big slow memory
- process sequences of page request
- on a fault, ie cache miss, need to evict something from the cache
Theorem: the "furthest-in-future" algo is optimal (minimizes the number of cache miss)
Scheduling problem: Definition
Setup:
- one shared resource
- many "jobs" to do
Question: In what order to do them ?
Assume:
- Each job as a weight wj (priority)
- and a length lj
Completion time cj of job j is the sum of job length up to and including j
Goal (objective function): minimizes the weighted sum of completion times ie min sum(j=1..n, wj*cj)
A Greedy Algorithm
Good idea: look at special cases, then move to general case
Edge case ideas: larger weights first, smaller length first
what if wi > wi and li > lj ?
Idea: assign a score: w - l or w/l
Breaking a greedy algorithm:
- Find examples where the two algorithms produce different input
- ex: algo 1 ot always correct, algo 2 always
Minimum spanning trees
Informal definition: connect some points as cheaply as possible Applications: clustering, networking
Fast greedy algorithm:
- Prim's algorithm, 1957
- Kruskal's algorithm, 1956 => O(mlogn) time unsing suitable data structures
Input: undirected graph G = (V, E), n,m, and a cost c for each edge
Assume adjacency list representation
Output: minimum cost tree T <= E that spans all vertices
- Has no cycles
- The subgraph (V, T) is connected (ie E a path for any 2 vertices)
Assumptions:
- G is connected
- Edge costs are distinct (algos still correct without this assumption)
Prim's MST Algorithm
X = [s], chosen arbitrarily
T = empty [invariant: X = vertices spanned by tree-so-far T]
while X != V
let e = (u, v) be the cheapest edge of G with u € X, v !€ X
add e to T
add v to X
O(n) iterations, O(m) per iteration => O(nm) running time for naive implementation
Empty cut lemma: A graph is not connected <=> E cut (A, B) with no crossing edges
Double crossing lemma: Suppose the cycle C € E has an edge crossing the cut (A, B), then so does some other edge of C
Lonely cut corollary: If e is the only edge crossing some cut (A, B), then it is not in any cycle
The cut property: Consider an edge e of G. Suppose there is a cut (A, B) such that e is the cheapest edge of G that crosses it. Then e belongs to the MST of G.
Fast Implementation
Using a Heap
Using heap to store edges with keys = cost => O(mlogn)
But there is better.
Maintain:
- Invariant 1: elements in heap = vertices of V - X
- Invariant 2: key[v] = cheapest edge (u, v) cost with u € X
Heap population:
- Scan edges, find cheapest edge for each vertice of V - X O(m), populate heap O(nlogn) --> O(m + nlogn) = O(mlogn) since m >= n-1 because G is connected
How to maintain invariant 2: When adding a new vertex, we need to recompute keys of the neighbours
when V added to X:
For each edge (v, w) € E:
If w € V - X
Delete w from heap
recompute key[w] = min(key[w], c_vw)
re insert w into heap
Running time with heaps:
- n - 1 inserts durring preprocessing
- n - 1 extract min per iteration
- each edge (v, w) triggers one delete/insert op --> O(m) heap operations (since m >= n - 1) --> O(mlogn)
Kruskal's minimum spanning tree algorithm
Kruskal's MST Algorithm
Assumption: G is connected, costs are distincts (to ease proofs)
Cut property: If e is the cheapest edge crossing some cut (A, B) then e belongs to the MST
Kruskal's MST algo:
sort edges in order of increasing cost
T = empty
for i = 1 to m:
if T u {i} has no cycle:
add edge at sorted index i to T
Implementing Kruskal's Algorithm via union-find
How to check for cycles: use BFS or DFS in the graph (V, T), which contains n - 1 edges
algo --> O(nm)
Union-find data structure: maitain a partition of a set of objects
Find: return name of the group x belongs to Union: fuse two groups into a single one
In Kruskal's: objects = vertices, groups = connected components, wen adding a new edge, check its groups and fuse them if possible
Motivation: O(1) cycle check
Idea:
- Maintain one linked structure per connected component
- each component has an arbitrary leader
Invariant:
- Each vertex points to its leader of its component ("name" of component is inherited from the leader)
key point:
- Cycle <=> edge (u,v), u and v have the same leader
Maintaining the invariant: When new edge (u, v) added to T, connectected components of u and v merge. When two components merge, have smaller one inherit the leader of the larger one. (maintain a count of each component)
Running time:
- O(mlogn) for sorting
- O(m) for cycle checks (O(1) per iteration)
- O(nlogn) for leader pointe updates
- --> O(mlogn) total
Application to Clustering
Goal: given n points, classify them into "coherent groups"
Assumptions:
- As input, given a (dis)similarity measure - a distance d(p, q) between each pair of points
- d is symmetric
- we know k = # of clusters desired
Call points p, q separated if they're assigned to different clusters
assign each point to a separate clusters
repeat until only k clusters:
let p, q be the closest pair of separated points
merge the 2 clusters of p and q into one
--> Called single-link clustering
Huffman codes
Introduction: prefix-free variable encoding
Binary code: map every char of an alphabet (S: say 32 chars) to a binary string
Obvious encoding: 32 5-bit binary strings (eg: ASCII)
Can we do better ? Yes if some chars are more frequent than others, using a variable length code
Suppose { A, B, C, D } = { 00, 01, 10, 11 }. Assume we use { 0 01 10 1 } instead
Then 001 could be AAD or AB
Prefix-free codes: make sure that for every pair i, j € S, neither the encoding F(i), F(j) is a prefix of the other.
Example: { 0, 10, 110, 111 } prefix-free
Example: A: 60% 0, B: 25% 10, C: 10% 110, D: 5% 111 --> 1.55 bits needed of average per char
Problem definition
Codes as tree:
Goal: find the best prefix-free encoding for a given set of character frequencies
Fact: binary codes <--> binary trees
Left child: 0, right child: 1
Prefix-free <--> labelled node are leaves
To decode: repeatedly follow path from root until you hit a leaf
Input: probability pi for each character i € S
Notation: if T = tree with leaves <--> symbols of S, then L(T) (average encoding length) = sum(on S, pi * depth of i in T)
A greedy algorithm
Natural but sub-optimal idea: top-down, divide and conquer
- Partition S into S1 and S2 each with 50% of total frequency
- Recursively compute T1 for S1 and T2 for S2, return
Huffman's (optimal) idea:
- build tree bottom up
HuffmanCodes(S) {
if |S|==2 return A --0-- x --1-- B
Let a, b € S ave the smallest frequencies
Let S' = S with a, b replaced wih new symbol ab
define pab = pa + pb
T' = HuffmanCodes(S')
Extend T' (with leaves = S') to a tree T (with leaves = S) by splitting leaf ab into a --0-- x --1-- b
return T
}
Running time: O(n²) where n = |S| Speed up: heap Even better: sorting + 2 queues
Introduction to dynamic programming
Weighted Independent Sets in Path Graphs
Input: a path graph G = (V, E) with nonnegative weights on vertices
Output: subset of non-adjacent vertices (an independant set) of maximum total weigth
WIS in Path Graphs: Optimal Substructure
Optimal substructure: critical step: reason about structure of an optimal solution
Upshot: a max-weight IS must be either
- a max-weight IS of G', or
- vn + a max-weight IS of G''
If we knew wether or not vn is in the max-weight IS, then could recursively compute the max-weight IS of G' or G''.
Let Gi = first i vertices of G
Plan: populate array A left to right with A[i] = Value of max-weight IS of Gi
A[0] = 0
A[1] = w1
For i=2:n {
A[i] = max(A[i - 1], A[i - 2] + wi)
}
WIS in Path Graphs: A Reconstruction Algorithm
Reconstruction algorithm
Let A = filled in array A
Let S = empty
While i >= 1 {
If A[i - 1] >= A[i - 2] + wi {
i -= 1
}
else {
add vi to S
i -= 2
}
}
return S
Principles of Dynamic Programming
- Identify a small number of subproblems
- Can quickly + correctly solve "larger" subproblems given the solutions to the smaller subproblems
- After solving all subproblems, can quickly compute the final solution
The Knapsack problem
Input: n items with
- a value vi (nonnegative)
- a size wi (nonnegative and integer) And a capacity W (nonnegative and integer)
Output: a subset S € { 1, ..., n } That maximises sum(vi) subject to sum(wi) < W
A Dynamic Programming Algorithm
Notation: Vix = value of the best solution that:
- Uses only the first i items
- Hastotal size <= x
Vix = max(V(i-1)x, vi + V(i-1)(x-wi)) <-- our recurrence
Edge case: if wi > x, must have Vix = V(i-1)x
Let A = 2D array
initialize A[0, x] = 0 for x = 0, 1, ..., W
For i = 1:n {
For x = 0:W {
A[i, x] = max(A[i - 1, x], A[i - 1, x - wi] + vi or 0 if wi > x)
}
}
Return A[n, W]
O(nW)
Sequence alignment
Optimal Substructure
Example: AGGGCT and AGGCA --> AGG-CA Total penalty: e_gap + e_AT
Inputs:
- two strings X = x1,...,xn and Y = y1,...,yn over some alphabet S (like { A, G, C, T })
- penalty e_gap
- penalty e_ab, (e_ab = 0 if a = b)
Goal: align strings and minimize penalty
A Dynamic Programming Algorithm
A = 2D array
A[i, 0] = A[0, i] = i * e_gap for all i >= 0
For i = 1:m {
For j = 1:n {
A[i, j] = min(
A[i - 1, j - 1] + e_xiyj
A[i - 1, j] + e_gap
A[i, j - 1] + e_gap
)
}
}
Return A[m, n]
O(mn)
Reconstructing:
- Trace back through filled-in table A, starting at A[m, n]
- When you reach subproblem A[i, j]:
- If A[i, j] filled using case 1, match xi and yj, goto A[i - 1, j - 1]
- If A[i, j] filled using case 2, match xi with a gap, goto A[i - 1, j]
- If A[i, j] filled using case 3, match yj with a gap, goto A[i, j - 1]
- If i = 0 or j = 0, match remaining substring with gaps
Optimal binary search trees
Problem definition
What is the "best" search tree for a given set of keys? --> A balanced search tree, like a red-black tree --> worst-case search time: O(height) = O(logn)
Input: frequencies p1, p2, ..., pn for items 1, 2, ..., n (assume items in sorted order ie 1 < ... < n)
Goal: compute a valid search tree that minimises the weighted (average) search time
C(T) = sum(items i, pi * search time for i in T)
If T is a red-black tree, then C(T) = O(logn)
Optimal Substructure
Focus on a top-down approach
Choosing the root can have hard to predict repercussions