Algorithms

Notes and homework for Coursera's "Algorithms" online class 📚

Divide and Conquer, Sorting and Searching, and Randomized Algorithms

Introduction

Why study algorithms?

foobar

Integer multiplication

input: 2 n-digit numbers x and y output: the product x.y

Primitive operation (for now): add or multiply 2 single-digit numbers.

Algo is "correct": the algo will eventually terminate and give the correct answer.

What about the amount of time needed?

upshot: nOperations <= some constant upshot: can we do better?

Karatsuba multiplication

Example: x = 5678, a = 56, b = 78 y = 1234, c = 12, d = 34

step1: a.c = 672 step2: d.b = 2652 step3: (a+b)(c+d) = 6164 step4: (3) - (2) - (1) = 2840 step5: (1) * 10^4 + (2) + (4) * 10^2 = 7006652, the output

Recursive algorithm: write x = 10^(n/2) * a + b, y = 10^(n/2) * c + d where a, b, c, d are n/2-digit numbers.

x.y = ... = 10^n * ac + 10^(n/2) * (ad + bc) + bd (¤)

idea: recursively compute ac, ad, bc, bd then compute ¤

step1: recursively compute ac step2: recursively compute bd step3: recursively compute (a + b)(c + d) = ac + ad + bc + bd Gauss's trick: (3) - (1) - (2) = ad + bc

upshot: only need 3 recursive multiplications and some additions

About the course

spoilers

Merge Sort: Motivation and Example

Improves over Selection, Insertion, Bubble sorts Good example for worst case behavior and asymptotic analysis

Recursion tree methods --> "Master method"

input: array of n numbers output: array of n numbers, sorted from min to max

Assumption: numbers are distincts (does not change much)

Merge Sort: Pseudocode / Merge Sort: Analysis

ignore base cases

C = output[length = n]
A = first sorted array [n/2]
B = second sorted array [n/2]
i = 1
j = 1

for k = 1 to n
  if A(i) < B(j)
    C(k) =  A(i)
    i++
  else
    C(k) = B(j)
    j++
end

Running time ? n numbers to sort

Upshot: <= 4n + 2 <= 6n (since n >= 1)

Claim: merge sort requires <= 6n * (log(n) + 1) operations to sort n

At each level j=0,1,...,log(n) there are 2^j subproblems each of size n/(2^j)

Guiding Principles for Analysis of Algorithms

worst case analysis
don't pay attention to constant factors and lower order terms
asymptotic analysis (focus on large input sizes)

Fast algorithm ~= worst-case running time grows slowly with input size

Holy grail: linear running time

Asymptotic analysis

The gist

Importance: vocabulary, "sweet spot", sharp enough to differentiate algo

High-level idea: suppress constant factors and lower-order terms.

constant factors: too system-dependant lower-order terms: irrelevant for large input

--> 6n(log(n) + 1) ==> nlog(n)

examples:

One loop: does an array contains a given integer --> O(n)
Two loops: does A or B contain t? --> O(n)
Two nested loops: do A and B have a number in common --> O(n²)
Two nested loops (II): does A have duplicate entries --> O(n²)

Big-Oh Notation

T(n) = function on N

T(n) = O(f(n)) <--> E c, n0 > 0 / T(n) <= c * f(n) A n >= n0

Basic Examples

Example 1

Claim: if T(n) = a_k * n^k + ... + a_0 * n^0 then T(n) = O(n^k)

Proof: choose n0 = 1 and c = |a_k|+...+|a_0|

Examples 2

Claim: A k >= 1, n^k != O(n^(k-1))

Proof: by contradiction. Suppose n^k = O(n^(k-1)) E c, n0 > 0 / n^k <= c * n^(k -1), A n >= 0 ie n <= c, A n >= 0

Big Omega and Theta

Omega notation: T(n) = Om(f(n)) <--> E c, n0 / T(n) >= c * f(n) A n >= n0

Theta notation: O(f(n)) AND Om(f(n)) T(n) = th(f(n)) <--> E c1, c2, n0 / c1 * f(n) <= T(n) <= c2 * f(n) A n >= n0

Little Oh notation: T(n) = o(f(n)) <--> A c > 0, E n0 / T(n) <= c * f(n) A n >= n0

Divide and conquer algorithms

O(n log n) Algorithm for Counting Inversions I

The divide and conquer paradigm:

Divide the problem into smaller subproblems
Conquer the subproblems via recursive calls
Combine solutions of subproblems into the original solution

Input: array A containing the numbers 1, 2, 3, ... in some arbitrary order Output: number of inversions = number of pairs (i,j) of indices with i < j and A[i] > A[j] Motivation: numerical similarity measure between two ranked lists Largest possible number of inversions: (n C 2) = n(n - 1) / 2

Note: (n choose k) = n! / (k!(n - k)!)

Brute force algorithm: O(n²) time

3 types of inversions, (i,j) <= n / 2; (i,j) > n / 2; i <= n / 2 < j (split inversion)

High-level algorithm:

CountInversions(array A of length n)
  if (n == 1) return 0
  else
    x = CountInversions(first half of A)
    y = CountInversions(second half of A)
    z = CountSplitInversions(A, n) // TODO
    return x + y + z

O(n log n) Algorithm for Counting Inversions II

Idea: piggyback on merge sort

SortAndCountInversions(array A of length n)
  if (n == 1) return (A, 0)
  else
    (B, x) = SortAndCountInversions(first half of A)
    (C, y) = SortAndCountInversions(second half of A)
    (D, z) = MergeAndCountSlipInversions(B, C, n) // TODO
    return (D, x + y + z)

MergeAndCountSlipInversions: when an element of 2nd array C gets copied into output D, increment total by number of elements remaining in 1st array B

Run time: O(nlog(n))

Strassen's Subcubic Matrix Multiplication Algorithm

n by b matrices

Classic multiplication: O(n^3)

Recursive algorithm n#1:

Idea: write X = (A, B; C, D) and Y = (E, F; G, H) where A...H are n/2 by n/2 matrices

Then: XY = (AE + BG, AF + BH; CE + DG, CF + DH)

--> O(n^3)

Strassen's algorithm (1969)

Step 1: recursiely compute only 7 (cleverly chosen) products
Step 2: do the necessary (clever) additions and subtractions (still O(n²) time)
Fact: better than cubic time! (see next lecture)

7 products:

P1 = A(F - H)
P2 = (A + B)H
P3 = (C + D)E
P4 = D(G - E)
P5 = (A + D)(E + H)
P6 = (B - D)(G + H)
P7 = (A - C)(E + F)

Claim: XY = (P5 + P4 - P2 + P6, P1 + P2; P3 + P4, P1 + P5 - P3 - P7)

O(n log n) Algorithm for Closest Pair

input: n points (€ R²) in the plane output: the closest pair in the plane (euclidian distance) assumption: all points have distinct x and y coordinates

Brute force: O(n²)

1-D version of closest pair:

sort points O(nlogn)
return closest pair of adjacent points O(n)

Goal: O(nlogn) for 2-D version

High level approach:

make copies of points sorted by x-coordinate (Px) and y-coordinates (Py) O(nlogn)
use divide-conquer

ClosestPair(Px, Py)
  # base case omitted
  let Q = leftmost half part of P
      R = rightmost
  form Qx, Qy, Rx, Ry (takes O(n) time)

  (p1, q1) = ClosestPair(Qx, Qy)
  (p2, q2) = ClosestPair(Rx, Ry)

  Either the closest pair is entirely in Q, either in R
  unlucky case: the pair is splitted in Q and R

  (p3, q3) = ClosestSplitPair(Px, Py)

  return best of 3 pairs

If ClosestSplitPair is O(n) then ClosestPair is O(nlogn)

ClosestSplitPair: key idea: only need to compute the unlucky case. Because the recursions will eventually make all cases unlucky

ClosestPair(Px, Py)
  let Q = leftmost half part of P
      R = rightmost
  form Qx, Qy, Rx, Ry (takes O(n) time)

  (p1, q1) = ClosestPair(Qx, Qy)
  (p2, q2) = ClosestPair(Rx, Ry)

  let D = min(d(p1, q2), d(p2, q2))

  (p3, q3) = ClosestSplitPair(Px, Py, D)

  return best of 3 pairs

ClosestSplitPair(Px, Py, D)
  let x = biggest x-coordinate in left of P (O(n))
  let Sy = points of P with x-coordinates in [x - D, x + D], sorted by y-coordinates (O(n))
  let best = D
  let best_pair = null

  for i = 1 to len(Sy) - 1
    for j = 1 to min(7, len(Sy) - i)
      let p, q = i-th, (i + j)-th points of Sy

      if d(p, d) < best
        best = d(p, q)
        best_pair = (p, q)
    # O(1) time
  # O(n) time

  return best_pair

Satisfies the correctness requirements u_u Claim: let p € Q, q € R be a split pair with d(p, q) < D Then: (A) p and q are members of Sy (B) p and q are at most 7 positions apart in Sy

The master method (aka master theorem)

Motivation

Re: Karatsuba

T(n) = maximum number of operations this algorithm needs to multiply two n-digit numbers

Recurrence: express T(n) in term of running time of recursive calls

Base case: T(1) <= a constant
A n > 1: T(n) <= 3T(n / 2) + O(n)

Formal Statement

assumption: all subproblems have equal size (previously: n / 2)

A n > n_base_case: T(n) <= a * T(n / b) + O(n^d)
a: number of recursive calls (>= 1)
b: input size shrinkage factor (> 1)
d: exponent in running time of "comble step" (>= 0)
(a, b, d) independent of n

if T(n) <= a * T(n / b) + O(n^d)
then T(n) = {
  if a = b^d: O(n^d * log(n))
  if a < b^d: O(n^d)
  if a > b^d: O(n^log_b(a))

Only gives upper bond, but can use theta

log: no base ? base does not matter (constant factor change) UNLESS in exponent, then the constant factor matters, a lot.

Examples

Merge sort: a = 2 b = 2 d = 1 --> a = b^d --> O(nlogn)

Binary search: a = 1 b = 2 d = 0 --> a = b^d --> O(logn)

Karatsuba without Gauss's trick: a = 4 b = 2 d = 1 --> a > b^d --> O(n^(log2(4))) = O(n²)

Karatsuba: a = 3 b = 2 d = 1 --> a > b^d --> O(n^(log2(3))) = O(n^1.59)

Strassen's matrix multiplication: a = 7 b = 2 d = 2 --> a > b^d --> O(n^(log2(7))) = O(n^2.81)

Proof

Assume: recurrence is (i) T(1) < c (ii) T(n) < a * T(n / b) + c * n^d and n is a power of b

Idea: generalize merge sort analysis ie use a recursion tree

at each level j=0,1,2,...,log_b(n) there are a^j subproblems each of size n / b^j

The recursion tree

Level 0: n Level 1: n/b, n/b, ... (a branches) ... Level log_b(n): base cases

Work at a single level (ignoring work in recursive calls): <= a^j * c * (n / b^j)^d where:

a^j is the # of level-j subproblems
n / b^j is the size of each level-j subproblem
c * (n / b^j)^d is the work per level-j subproblem

Total work: summing over all levels j: total work <= c * n^d * sum(j=0...log_b(n), (a / b^d)^j) (¤)

How to think about (¤)

a = rate of subproblem proliferation (RSP)
b^d = rate of work shrinkage (RWS) (per subproblem)
If RSP < RWP then the amount of work is decreasing with the recursion level j (expect O(n^d * logn))
If RSP > RWP then the amount of work is increasing with the recursion level j (extect O(n^d))
If RSP = RWP then the amount of work is the same at every recursion level j (expect O(# leaves))

If a = b^d then (¤) = c * n^d * (log_b(n) + 1) = O(n^d * log(n)) We can suppress constant factors

A r != 1, 1 + r + r² + ... + r^k = (r^(k + 1) - 1) / (r - 1) upshot:

If r < 1, <= 1 / (r - 1) = a constant (independent of k)
If r > 1, <= r^k * (1 + 1 / (r - 1))

Quicksort algorithm

Quicksort: Overview

Greatest hit algorithm
Prevalent in practice
O(nlogn), works in place

input: array of n unsorted numbers output: array of n sorted numbers assume: all array entries distinct

key idea: partition array arounda pivot element

pick element of array
rearrange array so that left to pivot === less than pivot, right === greater

Note: puts pivot in its "rightful position"

Partition:

linear O(n) time, no extra memory
reduces problem size

High-level description:

QuickSort(A of length n)
  if n == 1 return
  p = ChoosePivot(A)
  partition A around p
  recursively sort first and second part (not including p)

Partitioning Around a Pivot

The easy way: using O(n) extra memory. bouh!

In-place implementation: Assume: pivot is the first element of array High-level idea:

single scan through array
invariant: everything looked at so far is partitioned

Partition(A, l, r) // input = A[l...r]
  p = A[l]
  i = l + 1
  for j = l + 1 to r
    if A[j] < p // if not, do nothing
      swap A[j] and A[i]
      i++
  swap A[l] and A[i - 1]

Running time: O(n) where n = r - l + 1 (O(1) work per array entry)

Correctness: the for loop maintains the invariants (A[l + 1]...A[i - 1] < p and A[i]...A[j - 1] > p)

Correctness of Quicksort

By induction. Think about the partitioned array. The "parts" are smaller than n, hence the proof.

Choosing a Good Pivot

The running time of QuickSort depends on the quality of the pivot

If pivot is always the first element, on a sorted array QuickSort performs theta(n²) Because n + (n - 1) + ... + 1 < n²

If pivot is always the median element of te array, on any array QuickSort performs theta(nlogn)

How to choose pivots? Key idea: random pivots!

QuickSort theorem: for every input array of length n, the average running time of QuickSort with random pivots is O(nlogn)

QuickSort analysis

Analysis: A Decomposition Principle

Fixed input array A of length n

sample space Om = all possible outcomes of random choices in QuickSort (i.e. pivot sequences)

random variable A s € Om, E C(s) the # of comparisons between two input elements

lemma: running time of QuickSort dominated by companrisons

notation: z_i = i-th smallest item in A

X_ij(s) = # of times z_i and z_j get compared in pivot sequence s (indicator random variable: ie can take only 0 or 1)

A s, C(s) = sum(i=1...n - 1, sum(j=i + 1...n, X_ij(s)))

By linearity of ... this is cumbersome to type

A general decomposition principle:

Identify random variable Y that you really care about
Express Y as a sum of indicator random variable: Y = sum(X)
Apply linearity of expectation: E[Y] = sum(E[x]) = sum(P(x = 1))

Linear-time Selection

Randomized Selection - Algorithm

input: array A of n disctinct numbers and a number i output: i-th order statistic ie the i-th smallest element of A example: median

The 1st order statistic is the min --> trivial with linear scan The nth order statistic is the max --> idem

Easy to do with sorting, but nlogn

RandomizedSelection(A of length n, order statistic i)
  if n == 1 return A[0]
  choose pivot p uniformly at random
  partition A around p

  let j = new index of p

  if i == j return p // lucky

  if j > i return RandomizedSelection(first part of A of length j - 1, i)
  if j < i return RandomizedSelection(second part of A of length n - j, i - j)

Claim: RandomizedSelection is correct (proof: by induction)

Running time ? Depends on quality of the pivot worst pivot: theta(n²) key: find a pivot giving a balanced split best pivot: the median (lol) --> would get recurrence T(n) <= T(n / 2) + O(n) --master method--> T(n) = O(n) Hope: random pivot is "pretty good" "often enough"

RandomizedSelection theorem: for every input array of length n, the average running time is O(n)

Randomized Selection - Analysis

Notation: RS is in phase j if the current array size is between (3/4)^(j+1) * n and (3/4)^j * n X_j = number of recursion calls during phase j

...

coin flip: E[heads] = 1 + 0.5 * E[heads]

1 is the flip
0.5 is the probability that the flip fails, in which case we would try again

Deterministic Selection - Algorithm

key idea: use the "median of medians" as the pivot

ChoosePivot(A of length n)
  break A into n / 5 groups of size 5 each
  sort each group
  copy the n / 5 medians (ie middle elements of each sorted group) into new array C
  recursively compute median of C
  return it as the pivot
  partition around it

i.e.:

DSelect(A of length n, order statistic i)
  Break A into n / 5 groups of size 5, sort the groups
  C = the n / 5 "middle elements"
  p = DSelect( of length n / 5, n / 10)
  Partition A around P
  if j = i return p
  if j < i return DSelect(first part of A of length j - 1, i)
  if j > i return DSelect(second part of A of length n - j, i - j)

Not as good as RSelect in practice

worse constants
not in-place

Deterministic Selection - Analysis

Omega(n log n) Lower Bound for Comparison-Based Sorting

Graphs and the contraction algorithm

Graphs and Minimum Cuts

"Only" 20 yo algorithm

Cuts of a graph (V, E) a partition of V into two non-empty sets A and B

A graph with n vertices has 2^n possible cuts

Input: an undirected graph G = (V, E) Goal: compute a cut with fewest number of crossing edges (i.e. a min-cut)

Graph Representations

Let an undirected graph of n vertices Min # of edges: n - 1 Max # of edges: n(n - 1) / 2

n = # of vertices, m = # of edges In most applications, m is Omega(n) and O(n²)

Sparce graph: m is O(n) or close to it
Dense graph: m is closer to O(n²)

Adjacency matrix: represent G using a n by n matrix

A_ij = 1 <=> G has an i-j edge (requires n² space)

Adjacency list:

array or list of vertices (theta(n))
array or list of edges (theta(m))
each edge points to its endpoints (theta(m))
each vertex points to edges incident to it (theta(m)) (same pointers as the 3rd item) --> requires theta(n + m) space (or theta(max(n, m)))

Which is better: it depends on graph density and operations needed.

Random Contraction Algorithm

while there are more than 2 vertices left:
  - pick a remaining edge (u, v) at random
  - merge (or "contract") u and v into a single vertex
  - remove self loops
return cut represented by final 2 vertices

Analysis of Contraction Algorithm

What is the probability of success?

Fix G = (V, E), n and m Fix A and B the minimum cut Let k be the # of edges crossing (A, B), those edge form F

if an edge of F is contrated, the algo fails if not, succeeds

First iteration, probability to fail: k/m

Key observation: The degree of each vertex is at least k

We have sum(v, degree(v)) = 2m and sum(v, degree(v)) >= kn --> m >= kn / 2

P(fail1) = k/m <= 2 / n

Second iteration: P(success1, success2) = P(success2|succes1) * P(success1)

P(success1) >= 1 - 2/n P(success2|success1) = 1 - k/remaining edges

remaining edges >= m - 1 >= k(n - 1) / 2

Total probability: >= (1 - 2/n)(1 - 2/(n - 1))(1 - 2/(n - 2))...(1 - 2/(n - (n - 3))) i.e. >= 2 / (n(n - 1 )) >= 2 / n²

Problem: low success proba... but non trivial! (# of cuts is exp, proba is ²)

Solution: run the algo a large number N of times and remember the smallest cut found

How many trials needed?

P(all trials fail) <= (1 - 1 / n²)^N

We have 1 + x <= e^x

If we take N = n², P(all trials fail) <= (e^(-1/n²))^n² = 1 / e

If we take N = n²ln(n), P(all trials fail) <= 1 / n

Running time ? polynomial in n and m but slow Omega(n²m)

But: better algo exist

Counting Minimum Cuts

A graph may ave more than one min cut

Eg: a tree has n - 1 min cut

Question: what is the largest number of min cut a graph can have? Answer: n choose 2 = n(n - 1) / 2

Graph Search, Shortest Paths, and Data Structures

Graph search and connectivity

Graph Search - Overview

Generic graph search Goals:

Find everything findable from a given vertex
Don't explore anything twice
in O(n + m) time

Generic algo: given a grap G and a vertex s

initially s explored, all other vertices unexplored
while possible:
- choose an edge (u, v) with u explored and v unexplored
- mark v explored

Claim: at the end of the algo, v explored <=> G has a path from s to v

Breath-First Search (BFS)

explore nodes in layer
can compute shortest paths
can compute connected components of an undirected graph --> O(n + m) time using a queue (FIFO)

Depth-First Search (DFS)

explore agressively like a maze, backtrack only when necessary
compute topological ordering of a DAG
compute connected components in directed graphs

Breadth-First Search (BFS): The Basics

BFS(graph G, start vertex s)
  mark s as explored
  let Q = queue FIFO initialized with s
  while Q != 0:
    v = Q.shift() // The first
    for each edge (v, w):
      if w unexplored:
        mark w as explored
        Q.append(w) // At the end

Running time: O(ns + ms), where ns is the # of nodes reachable from s

BFS and Shortest Paths

Goal: compute dist(v), te fewest # of edges between s and v

extra code:

initialize dist(v) = { 0 is s == v, +infinity otherwise }
when considering edge (v, w):
- if w inexplored: dist(w) = dist(v) + 1

BFS and Undirected Connectivity

Connected components: "the pieces" of G

Formal definition: equivalence classes of the relation u~v <=> E u-v path in G

Goal: compute all connected components Why: check if network is disconnected

ComputeAllComponents(G)
  mark all nodes as unexplored
  for i=1 to n
    if i not explored
      BFS(G, i)
# note: maybe some code to describe the found components

Running time: O(n + m)

Depth-First Search (DFS): The Basics

DFS(G, starting vertex s)
  mark s as explored
  for every edge (s, v):
    if v is unexplored:
      DFS(G, v)

Could also mimick BFS with a stack instead of a queue.

Running time: O(ns + ms)

Topological Sort

Definition: a topological ordering of a directed graph G is a labelling F of G's nodes such that:

the f(v)'s are the set {1,2,...,n}
(u,v) € G => f(u) < f(v)

Motivation: Sequence tasks while respecting precedence constraints

Note: cycle => no topo ordering

Note: every DAG has a sink vertex.

Straightforward solution:

let v be a sink vertex of G
set f(v) = n
recurse on G - {v}

Topological sort via DFS (Slick)

TopologicalSort(G)
  mark all vertices unexplored
  current_label = n
  for each vertex v:
    if v not yet explored:
      DFSForTopologicalSort(G, v)

DFSForTopologicalSort(G, s)
  mark s as explored
  for every edge (s, v):
    if v not yet explored:
      DFS(G, v)
  set f(s) = current_label
  current_label--

Running time: O(m + n) Reason: O(1) time per node, O(1) time per edge Correctness: need to show that if (u,v) is an edge, then f(u)<f(v)

Computing Strong Components: The Algorithm

Definition: the Strongly Connected Components (SCC) of a directed graph are the equivalence classes of the relation: u~v <=> E path u->v and a path v->u in G

Kosaraju's two-pass algorithm O(m+n)

KosarajuSCC(G)
  let Grev = G with all arcs reversed (could also run DFS going backward)
  run DFS-loop on Grev (goal: compute "magical ordering" of nodes)
    let f(v) = "finising time" of each vertex v
  run DFS-loop on G (goal: discover the SCC one by one)
    processing nodes in decreasing order of finising time
    save the leaders

global variable t = 0 (# of nodes processed so far)
global variable s = Null (current source vertex)

DFS-loop(G)
  Assume nodes labeled 1 to n
  for i=n to 1:
    if i not yet explored:
      s = i
      DFS(G, i)

DFS(G, i)
  mark i as explored (in a given DFS-loop)
  set leader(i) = node s
  for each edge (i,j):
    if j not yet explored:
      DFS(G, j)
  t++
  set f(i) = t

Computing Strong Components: The Analysis

2nd pass of DFS-loop begings somewhere in a sink SCC C* First pass of DFS-loop discovers C* and nothing else

Dijkstra's shortest-path algorithm

Dijkstra's Shortest-Path Algorithm

Input: a (directed) Graph G = (V, E) (n, m) each edge has a nonnegative length "le" source vertex s

Output: for each v€V, compute L(v) = length of the shortest s-v path in G

Assumption:

for convinience: A v€V, E s->v path
important: no negative edge length: le >= 0 A e € E

BFS computes the shortest path iff le=1 A e € E

DijkstraShortestPath(G, s)
  X = {s}             # Vertices processed so far
  A[s] = 0            # Computed shortest path distances
  B[s] = empty path   # Computed shortest paths (do not include in actual implementation)

  while X != V:
    among all edges (v, w) € E with v € X, w not € X, pick the one that minimizes A[v] + l_vw (Dijkstra's greedy criterion), call it (v*, w*)
    add w* to X
    set A[w*] = A[v*] + l_v*w*
    set B[w*] = B[v*] + (v*, w*)

Dijkstra's Algorithm: Implementation and Running Time

As such, running time is theta(nm). But we can do better.

Heap: perform insert, extract min in O(logn) time

perfectly balanced binary tree (height: log_2(n))
extract min by swapping up last leaf, bubbling down
insert bia bubling up Also: will need the ability to delete from the middle of the heap (bubble up or down, as needed)

Two invariants:

elements in heap = vertices of V - X
for v € V - X, key[v] = smallest Dijkstra greedy score of an edge (u, v) € E with U € X (key means value somehow) or +infinity if no such edge

With those invariants, extract-min yields correct vertex w* to add to X next

Maintaining the 2nd invariant: When w is extracted from heap, (ie added to X)

for each edge (w, v) € E:
  if v € V - X (ie in heap):
    remove v from heap
    recompute key[v] = min(key[v], A[w] + l_wv)
    reinsert v into heap

Running time analysis:

Dominated by heap operations, O(logn) each
(n - 1) extract mins
each edge (v, w) triggers at most one Delete/Insert combo (if v added to X first) --> # of heap operations is O(n + m) = O(m) since the graph is weakly connected --> Running time: O(mlogn)

Heaps

Data Structures: Overview

Point: organize data so that it can be accessed quickly and usefully

Ex: lists, stacks, queues, heaps, search trees, hashtables, bloom filters, union-find, etc...

Heaps: Operations and Applications

insert O(logn)
extract min O(logn)
heapify O(n)
delete O(logn)

Application: sorting: HeapSort

Application: Median maintenance

Input: a sequence of n numbers Output: for each step, the median of the numbers Constraint: O(log i)

Solution: 1 extractMin (Hhigh) and 1 extractMax (Hlow) heap Maintain invariant: i/2 smallest (largest) elements in Hlow (Hhigh)

Balanced binary search trees

Balanced Search Trees: Operations and Applications

Things to do with a sorted array:

Search (binary search) O(logn)
Select (a given order statistic, ex: min/max) O(1)
Predecessor / successor O(1)
Rank (know # of keys less than a given value) O(logn)
Output in sorted order O(n)

--> Apply on a static dataset

BST: like sorted array + fast log insertion and deletion

Search O(logn)
Select O(logn)
Min/Max O(logn)
Predecessor / successor O(logn)
Rank O(logn)
Output O(n)
Insert / Delete O(logn)

Binary Search Tree Basics

Exactly one node per key most basic version each child has left and right child + parent pointers

Search tree property: left children < current key, right children > current key

Height of a BST: Number or "levels" (arcs) Note: many possible search tree for a set of keys --> height or depth: log2n <= h <= n

Search:

Start at the root
Traverse left/right child pointrs as needed

Insert:

Search for k (unseccessfully)
Rewire final Null pointer to point to new node with key k

--> theta(height)

Min/Max:

Start at the root
Follow left/right child pointers, return last key found

Predecessor of k:

Easy case: if k's left subtree nonempty, return max key in left
Otherwise: follow parent pointers until you get to a key less than k

In-order traversal

Let r = root of search tree, with subtrees Tl and Tr
Recurse on Tl
print out r's key
Recurse on Tr

Deletion (of a key k)

Search for key
easy case (k has no children): just delete k
medium case (k has only one child): just "splice out" k, replace with child
difficult case (k has 2 children):
compute k's predecessor l
swap k and l (note: k does not have a right child anymore)

--> theta(height)

Select and rank: Idea: store a little bit of extra info at each tree node about the tree itself Example augmentation: size(x) = # of tree nodes in subtree rooted at x = size(y) + size(z) + 1 (children + 1)

How to select ith order statistic (with such a tree)

start at root x, with children y and z
let a = size(y) (0 if no left child)
if a = i - 1 return x's key
if a >= i recursively compute ith order statistic of search tree rooted at y
if a < i - i recursively compute (i - a - 1)th order statistic of tree rooted at z

--> theta(height)

Red-Black Trees

Balanced, logn garanteed

Idea: ensure that height stays logarithmic --> Ops run at O(logn)

Example: red-black trees (see also AVL trees, splaytrees, b-trees, b+-trees)

Red-black invariants:

Same as BST, plus more
Each node red or black
Root is black
No 2 reds in a row (red node ==> children must be black)
Every root-Null path (unsuccessful search) path has same number of black nodes

Example: a chain of length 3 cannot be a red-black tree. (4th invariant violated)

Example: perfectly balanced search tree

Height garantee: every red-black tree with n nodes has height O(logn) [h <= 2log2(n+1)]

Rotations

Left rotations: locally rebalance subtrees at a node in O(1) time

Of a parent x and right child y (x < y).

P - x - A (left st of x), (y - B (left st of y), C (right sb of y)) --> P - y - (x - A, B), C

Hashing; bloom filters

Hash Tables: Operations and Applications

purpose: maintain a possibly evolving set of stuff. (gné?)

Supported operation: insert, delete, lookup (using a "key") --> O(1)

Hash Tables: Implementation Details

Setup: know the universe U that we ant to store (all IP address, all chessboard position, ...)

Goal: want to maintain a set S € U

Naive solutions:

array-based, indexed by u, O(1) time but theta(|U|) space
list-based, theta(|S|) space but theta(|S|) memory

Solution:

pick n = # of "buckets" with n ~= |S| (assume |S| does not vary too much)
choose a hash function h: U --> {0, 1, 2, ..., n - 1}
Use array A of length n, store x in A[h(x)]

Birthday "paradox" => colisions will happen

Resolving colisions:

solution #1: separate chaining

keep linked list in each bucket
given a key/object x, perform insert/delete/lookup in the list in A[h(x)]

solution #2: open addressing (only o object per bucket)

hash function now specifies probe sequence h1(x), h2(x), ...
keep trying until find an open slot

What makes a good hash function ?

Note: in hash table with chaining, insert is O(1), O(list length) for insert/delete Point: performance depends on a good hash function

Properties of a good has function:

should "spread the data" as much as possible --> performance
does not work too hard, fast to evaluate

Bad hash function, keys = phone numbers (10 digits) |u| = 10^10, n = 10^3

first 3 digits (terrible, some buckets will be filled, other empty)
last 3 digits (not uniformly distributed)

next example: keys = memory locations (will be multiples of a number of 2)

bad hash function: x mod 1000 (some buckets will be empty (odd numbers))

Quick and dirty hash functions:

turn the object into a (preferably big) number (eg: subroutine to convert strings into intergers)
compress that number (eg: the mod function)

How to choose the number of buckets:

choose n to be a prime number (no multiple in common with data)
not too close to a power of 2 or 10

Pathological Data Sets and Universal Hashing Motivation

The load of a hash table: alpha = # of things that have been inserted / # of buckets of hash table

For good hash table performance:

alpha = O(1) is necessary condition for operations to run in constant time
ith open addressing, need alpha << 1

--> Need to control the load, make sure the # of buket increases with the # of items in the HT

Pathological data sets

--> Need a good hash function (spreads the data evenly) --> super good hash function does not exist

Hacking prevention:

use a cryptografic hash function (ex: SHA-2), infeasible to reverse engineer
use randomization: design a family of hash functions such as for any data set, on average, the hash function will perform well

Universal Hashing: Definition and Example

What is a good random hash function ?

Def: let H be a set of hash functions U --> {0, ..., n - 1}

H is universal <=> A (x, y) € U², x != y, P(h(x) = h(y), h € H) <= 1 / n where h is chosen uniformly at random

Example: IP address (form: (x1, x2, x3, x4) with x € {0, ..., 255})

Let n = a prime

Construction:

Bloom filters

Bloom Filters: the basics

Raison d'être lookups

Comparaison with HT:

Pro: more space efficient
Con: can't store objects, no deletions (with basic implementation), small fasle negative probability

Applications:

Early spellchecker (is this a legitimate word)
Canonical: list of forbiden passwords
Modern: network routers

Under the hood:

Array of n bits (0 or 1)
k hash functions h1, ..., hk (k small constant)

Insert: for i = 1...k, set A[hi(x)] = 1 Lookup: return true if A[hi(x)] = 1 A i = 1...k

Note: no false negative, but posibly false positive

Heuristic Analysis

Intuition: sould be a trade-off between space and error probability

Proba that a bit (in BF) has been set to 1 = 1 - (1 - 1/n)^(k * |S|)

ie P <~= 1 - e^(-k|S|/n) = 1 - e^(-k/b) where b is # bits per object

So under assumptions, for x !€ S, false positive probalbility is <= (1 - e^-k/b)^k

How to set k ? k ~= ln2 * b

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

Greedy Algorithms

Introduction to Greedy Algorithms

Definition: iteratively make "myopic" decisions, hope everything works at the end

Example: Dijkstra's shortest path

Application: Optimal Caching

The caching problem:

small fast memory (the cache)
big slow memory
process sequences of page request
on a fault, ie cache miss, need to evict something from the cache

Theorem: the "furthest-in-future" algo is optimal (minimizes the number of cache miss)

Scheduling problem: Definition

Setup:

one shared resource
many "jobs" to do

Question: In what order to do them ?

Assume:

Each job as a weight wj (priority)
and a length lj

Completion time cj of job j is the sum of job length up to and including j

Goal (objective function): minimizes the weighted sum of completion times ie min sum(j=1..n, wj*cj)

A Greedy Algorithm

Good idea: look at special cases, then move to general case

Edge case ideas: larger weights first, smaller length first

what if wi > wi and li > lj ?

Idea: assign a score: w - l or w/l

Breaking a greedy algorithm:

Find examples where the two algorithms produce different input
ex: algo 1 ot always correct, algo 2 always

Minimum spanning trees

Informal definition: connect some points as cheaply as possible Applications: clustering, networking

Fast greedy algorithm:

Prim's algorithm, 1957
Kruskal's algorithm, 1956 => O(mlogn) time unsing suitable data structures

Input: undirected graph G = (V, E), n,m, and a cost c for each edge

Assume adjacency list representation

Output: minimum cost tree T <= E that spans all vertices

Has no cycles
The subgraph (V, T) is connected (ie E a path for any 2 vertices)

Assumptions:

G is connected
Edge costs are distinct (algos still correct without this assumption)

Prim's MST Algorithm

X = [s], chosen arbitrarily
T = empty [invariant: X = vertices spanned by tree-so-far T]
while X != V
  let e = (u, v) be the cheapest edge of G with u € X, v !€ X
  add e to T
  add v to X

O(n) iterations, O(m) per iteration => O(nm) running time for naive implementation

Empty cut lemma: A graph is not connected <=> E cut (A, B) with no crossing edges

Double crossing lemma: Suppose the cycle C € E has an edge crossing the cut (A, B), then so does some other edge of C

Lonely cut corollary: If e is the only edge crossing some cut (A, B), then it is not in any cycle

The cut property: Consider an edge e of G. Suppose there is a cut (A, B) such that e is the cheapest edge of G that crosses it. Then e belongs to the MST of G.

Fast Implementation

Using a Heap

Using heap to store edges with keys = cost => O(mlogn)

But there is better.

Maintain:

Invariant 1: elements in heap = vertices of V - X
Invariant 2: key[v] = cheapest edge (u, v) cost with u € X

Heap population:

Scan edges, find cheapest edge for each vertice of V - X O(m), populate heap O(nlogn) --> O(m + nlogn) = O(mlogn) since m >= n-1 because G is connected

How to maintain invariant 2: When adding a new vertex, we need to recompute keys of the neighbours

when V added to X:
  For each edge (v, w) € E:
    If w € V - X
      Delete w from heap
      recompute key[w] = min(key[w], c_vw)
      re insert w into heap

Running time with heaps:

n - 1 inserts durring preprocessing
n - 1 extract min per iteration
each edge (v, w) triggers one delete/insert op --> O(m) heap operations (since m >= n - 1) --> O(mlogn)

Kruskal's minimum spanning tree algorithm

Kruskal's MST Algorithm

Assumption: G is connected, costs are distincts (to ease proofs)

Cut property: If e is the cheapest edge crossing some cut (A, B) then e belongs to the MST

Kruskal's MST algo:

sort edges in order of increasing cost
T = empty
for i = 1 to m:
  if T u {i} has no cycle:
    add edge at sorted index i to T

Implementing Kruskal's Algorithm via union-find

How to check for cycles: use BFS or DFS in the graph (V, T), which contains n - 1 edges

algo --> O(nm)

Union-find data structure: maitain a partition of a set of objects

Find: return name of the group x belongs to Union: fuse two groups into a single one

In Kruskal's: objects = vertices, groups = connected components, wen adding a new edge, check its groups and fuse them if possible

Motivation: O(1) cycle check

Idea:

Maintain one linked structure per connected component
each component has an arbitrary leader

Invariant:

Each vertex points to its leader of its component ("name" of component is inherited from the leader)

key point:

Cycle <=> edge (u,v), u and v have the same leader

Maintaining the invariant: When new edge (u, v) added to T, connectected components of u and v merge. When two components merge, have smaller one inherit the leader of the larger one. (maintain a count of each component)

Running time:

O(mlogn) for sorting
O(m) for cycle checks (O(1) per iteration)
O(nlogn) for leader pointe updates
--> O(mlogn) total

Application to Clustering

Goal: given n points, classify them into "coherent groups"

Assumptions:

As input, given a (dis)similarity measure - a distance d(p, q) between each pair of points
d is symmetric
we know k = # of clusters desired

Call points p, q separated if they're assigned to different clusters

assign each point to a separate clusters

repeat until only k clusters:
  let p, q be the closest pair of separated points
  merge the 2 clusters of p and q into one

--> Called single-link clustering

Huffman codes

Introduction: prefix-free variable encoding

Binary code: map every char of an alphabet (S: say 32 chars) to a binary string

Obvious encoding: 32 5-bit binary strings (eg: ASCII)

Can we do better ? Yes if some chars are more frequent than others, using a variable length code

Suppose { A, B, C, D } = { 00, 01, 10, 11 }. Assume we use { 0 01 10 1 } instead

Then 001 could be AAD or AB

Prefix-free codes: make sure that for every pair i, j € S, neither the encoding F(i), F(j) is a prefix of the other.

Example: { 0, 10, 110, 111 } prefix-free

Example: A: 60% 0, B: 25% 10, C: 10% 110, D: 5% 111 --> 1.55 bits needed of average per char

Problem definition

Codes as tree:

Goal: find the best prefix-free encoding for a given set of character frequencies

Fact: binary codes <--> binary trees

Left child: 0, right child: 1

Prefix-free <--> labelled node are leaves

To decode: repeatedly follow path from root until you hit a leaf

Input: probability pi for each character i € S

Notation: if T = tree with leaves <--> symbols of S, then L(T) (average encoding length) = sum(on S, pi * depth of i in T)

A greedy algorithm

Natural but sub-optimal idea: top-down, divide and conquer

Partition S into S1 and S2 each with 50% of total frequency
Recursively compute T1 for S1 and T2 for S2, return

Huffman's (optimal) idea:

build tree bottom up

HuffmanCodes(S) {
  if |S|==2 return A --0-- x --1-- B
  Let a, b € S ave the smallest frequencies
  Let S' = S with a, b replaced wih new symbol ab
  define pab = pa + pb
  T' = HuffmanCodes(S')
  Extend T' (with leaves = S') to a tree T (with leaves = S) by splitting leaf ab into a --0-- x --1-- b
  return T
}

Running time: O(n²) where n = |S| Speed up: heap Even better: sorting + 2 queues

Introduction to dynamic programming

Weighted Independent Sets in Path Graphs

Input: a path graph G = (V, E) with nonnegative weights on vertices

Output: subset of non-adjacent vertices (an independant set) of maximum total weigth

WIS in Path Graphs: Optimal Substructure

Optimal substructure: critical step: reason about structure of an optimal solution

Upshot: a max-weight IS must be either

a max-weight IS of G', or
vn + a max-weight IS of G''

If we knew wether or not vn is in the max-weight IS, then could recursively compute the max-weight IS of G' or G''.

Let Gi = first i vertices of G

Plan: populate array A left to right with A[i] = Value of max-weight IS of Gi

A[0] = 0
A[1] = w1

For i=2:n {
  A[i] = max(A[i - 1], A[i - 2] + wi)
}

WIS in Path Graphs: A Reconstruction Algorithm

Reconstruction algorithm

Let A = filled in array A

Let S = empty
While i >= 1 {
  If A[i - 1] >= A[i - 2] + wi {
    i -= 1
  }
  else {
    add vi to S
    i -= 2
  }
}

return S

Principles of Dynamic Programming

Identify a small number of subproblems
Can quickly + correctly solve "larger" subproblems given the solutions to the smaller subproblems
After solving all subproblems, can quickly compute the final solution

The Knapsack problem

Input: n items with

a value vi (nonnegative)
a size wi (nonnegative and integer) And a capacity W (nonnegative and integer)

Output: a subset S € { 1, ..., n } That maximises sum(vi) subject to sum(wi) < W

A Dynamic Programming Algorithm

Notation: Vix = value of the best solution that:

Uses only the first i items
Hastotal size <= x

Vix = max(V(i-1)x, vi + V(i-1)(x-wi)) <-- our recurrence

Edge case: if wi > x, must have Vix = V(i-1)x

Let A = 2D array
initialize A[0, x] = 0 for x = 0, 1, ..., W

For i = 1:n {
  For x = 0:W {
    A[i, x] = max(A[i - 1, x], A[i - 1, x - wi] + vi or 0 if wi > x)
  }
}

Return A[n, W]

O(nW)

Sequence alignment

Optimal Substructure

Example: AGGGCT and AGGCA --> AGG-CA Total penalty: e_gap + e_AT

Inputs:

two strings X = x1,...,xn and Y = y1,...,yn over some alphabet S (like { A, G, C, T })
penalty e_gap
penalty e_ab, (e_ab = 0 if a = b)

Goal: align strings and minimize penalty

A Dynamic Programming Algorithm

A = 2D array
A[i, 0] = A[0, i] = i * e_gap for all i >= 0
For i = 1:m {
  For j = 1:n {
    A[i, j] = min(
      A[i - 1, j - 1] + e_xiyj
      A[i - 1, j] + e_gap
      A[i, j - 1] + e_gap
    )
  }
}
Return A[m, n]

O(mn)

Reconstructing:

Trace back through filled-in table A, starting at A[m, n]
When you reach subproblem A[i, j]:
- If A[i, j] filled using case 1, match xi and yj, goto A[i - 1, j - 1]
- If A[i, j] filled using case 2, match xi with a gap, goto A[i - 1, j]
- If A[i, j] filled using case 3, match yj with a gap, goto A[i, j - 1]
If i = 0 or j = 0, match remaining substring with gaps

Optimal binary search trees

Problem definition

What is the "best" search tree for a given set of keys? --> A balanced search tree, like a red-black tree --> worst-case search time: O(height) = O(logn)

Input: frequencies p1, p2, ..., pn for items 1, 2, ..., n (assume items in sorted order ie 1 < ... < n)

Goal: compute a valid search tree that minimises the weighted (average) search time

C(T) = sum(items i, pi * search time for i in T)

If T is a red-black tree, then C(T) = O(logn)

Optimal Substructure

Focus on a top-down approach

Choosing the root can have hard to predict repercussions

dherault/coursera_algorithms

Algorithms

Divide and Conquer, Sorting and Searching, and Randomized Algorithms

Introduction

Why study algorithms?

Integer multiplication

Karatsuba multiplication

About the course

Merge Sort: Motivation and Example

Merge Sort: Pseudocode / Merge Sort: Analysis

Guiding Principles for Analysis of Algorithms

Asymptotic analysis

The gist

Big-Oh Notation

Basic Examples

Example 1

Examples 2

Big Omega and Theta

Divide and conquer algorithms

O(n log n) Algorithm for Counting Inversions I

O(n log n) Algorithm for Counting Inversions II

Strassen's Subcubic Matrix Multiplication Algorithm

O(n log n) Algorithm for Closest Pair

The master method (aka master theorem)

Motivation

Formal Statement

Examples

Proof

Quicksort algorithm

Quicksort: Overview

Partitioning Around a Pivot

Correctness of Quicksort

Choosing a Good Pivot

QuickSort analysis

Analysis: A Decomposition Principle

Linear-time Selection

Randomized Selection - Algorithm

Randomized Selection - Analysis

Deterministic Selection - Algorithm

Deterministic Selection - Analysis

Omega(n log n) Lower Bound for Comparison-Based Sorting

Graphs and the contraction algorithm

Graphs and Minimum Cuts

Graph Representations

Random Contraction Algorithm

Analysis of Contraction Algorithm

Counting Minimum Cuts

Graph Search, Shortest Paths, and Data Structures

Graph search and connectivity

Graph Search - Overview

Breadth-First Search (BFS): The Basics

BFS and Shortest Paths

BFS and Undirected Connectivity

Depth-First Search (DFS): The Basics

Topological Sort

Computing Strong Components: The Algorithm

Computing Strong Components: The Analysis

Dijkstra's shortest-path algorithm

Dijkstra's Shortest-Path Algorithm

Dijkstra's Algorithm: Implementation and Running Time

Heaps

Data Structures: Overview

Heaps: Operations and Applications

Balanced binary search trees

Balanced Search Trees: Operations and Applications

Binary Search Tree Basics

Red-Black Trees

Rotations

Hashing; bloom filters

Hash Tables: Operations and Applications

Hash Tables: Implementation Details

Pathological Data Sets and Universal Hashing Motivation

Universal Hashing: Definition and Example

Bloom filters

Bloom Filters: the basics

Heuristic Analysis

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

Greedy Algorithms

Introduction to Greedy Algorithms

Application: Optimal Caching