A quick study sheet I use as a refresher 😄
Table of Contents generated with DocToc
- Data Structures
- Algorithms
- Other Concepts
- Math
- Common Problems
- Just Python Things
- Problem-solving Strategies
- An array is a collection with specified size
- Dynamic array: some languages' implementations automatically expand as you add elements
- Access elements directly by index
- Time complexity:
- Access by index:
O(1)
- Search by value:
O(n)
- Insert:
O(n)
(need to shift values) - Delete:
O(n)
(need to shift values)
- Access by index:
- A linked list is a collection of nodes where each node has a value and a reference
- Singly-linked list: nodes have pointers to the next node
- Doubly-linked list: nodes have pointers to next and previous nodes
- Time complexity:
- Access by index:
O(n)
- Search by value:
O(n)
- Insert:
O(1)
- Delete:
O(1)
- Access by index:
- Stack: last in, first out (LIFO)
- Queue: first in, first out (FIFO)
- Double-ended queue: stack + queue combined
- Principal operations:
push
adds elements &pop
extracts elements
- A tree is an undirected, connected, acyclic graph
- Has
v
vertices andv-1
edges - Any two vertices are connected by a unique path
- A leaf is a vertex of degree 1
- One node is designated as the root
- Each node has parent and/or children pointers
- A node's height is the length of its path to the root
- Has
- A forest has multiple distinct trees (a disjoint union)
- An n-ary tree has at most
n
children per node
- A binary tree has nodes with at most 2 children (designated left & right)
- Full: every node has 0 or 2 children
- Number of nodes is at most
2^(h+1)-1
- Number of nodes is at most
- Complete: every level, except possibly the last, is filled, and the last level's nodes are as far left as possible
- Number of internal nodes:
floor(n/2)
- Number of internal nodes:
- Balanced: has the minimum possible maximum depth
- Height is
ceil(lg(n+1))
- Height is
- Traversals:
- Pre-order: open current, visit left subtree, visit right subtree
- In-order: visit left subtree, open current, visit right subtree (returns sorted list)
- Post-order: visit left subtree, visit right subtree, open current
- Level-order: breadth-first traversal, level by level
- A binary search tree is an ordered binary tree
- Satisfies the BST property: each node's value is greater than all keys stored in the left subtree and less than all keys stored in the right subtree
- Designed to make searching faster--each comparison allows operations to skip about half the tree
- Search: recursively search subtrees; takes
O(h)
- Insertion: like search, but insert node when a leaf is reached; takes
O(h)
- Deletion: more complicated; takes
O(h)
- If deleting a node with no children, just do it
- If deleting a node with a single child, replace the node with its subtree
- If deleting a node with two children, swap with minimum value in right subtree or maximum value in left subtree, then delete the node (which should now be a leaf)
- Self-balancing binary search tree
- Lookup, insertion, and deletion all take
O(log n)
- Also called a prefix tree
- Stores subsequences of values; useful for autocomplete
- Hash function: a function mapping an object to an integer such that if
a==b
,H(a)==H(b)
- A hash table is an array whose indices correspond to results from a hash function (implemented as a dictionary in Python)
- Provides
O(1)
lookup - Collision resolution & table doubling
- Special tree where nodes have higher (in the case of a min-heap) values than their parents
- Binary heap:
- Heapify in
O(n)
- Find min in
O(1)
- Extract min, increase key, insert, delete in
O(log n)
- Can implement as a list where a node at index
i
has children at2i+1
and2i+2
- Heapify in
- Collection of nodes and edges
- Spanning tree: a tree that includes all nodes in the graph
- Minimum spanning tree: a spanning tree with minimum total edge weights
- Given a sorted list, start at the midpoint and divide and conquer
O(log n)
- Maintain a sorted sublist and insert new elements in it appropriately
- Sorts in-place; stable
- Best-case
O(n)
, averageO(n^2)
, worstO(n^2)
- On each pass through the array, compare adjacent pairs of elements and swap if necessary
- Sorts in-place; stable
- Best-case
O(n)
, averageO(n^2)
, worstO(n^2)
- Exchange current element with smallest element to the right of the current element
- Sorts in-place; unstable
- Best-case
O(n^2)
, averageO(n^2)
, worstO(n^2)
- Recursively divide until sublists are size 1, then recursively merge the sublists
- Requires
O(n)
space; stable - Best-case
O(n log n)
, averageO(n log n)
, worstO(n log n)
- Set a pivot in the array; move elements smaller than pivot to its left and elements larger to the right
- Recursively sort left and right sublists
- Requires
O(log n)
space; stable - Best-case
O(n log n)
, averageO(n log n)
, worstO(n^2)
- For lists whose elements' values are in a set range
- Not a comparison sort so best & average is
O(n+k)
and worst isO(n^2)
- Iterate through list and place items in buckets; can be stable
- Apply a stable counting sort to every place value in a number
- Sort places from least to most significant
- Requires
O(n+k)
space;O(d(n+k))
time - Also not a comparison sort
- Given a graph, find a path from a start node to an end node
- General strategy: expand a node, check to see if it's the goal node, add its children to the search agenda
- In the case of weighted graphs, a heuristic may help find the shortest path faster
- Admissible: heuristic's value for a node is less than actual distance from node to goal (
H(n,G) ≤ dist(n,G)
for all nodesn
) - Consistent: heuristic follows triangle inequality (
|H(A)-H(B)| ≤ dist(A,B)
for all nodesA,B
)
- Admissible: heuristic's value for a node is less than actual distance from node to goal (
- Implement with a stack (add new paths to the front of the agenda)
- Implement with a queue (add new paths to the end of the agenda)
- In an unweighted graph, guaranteed to find shortest path
- Add new paths to the front of the agenda
- Sort new paths by terminal node's heuristic
- Add new paths to the front of the agenda
- Sort all paths in agenda by terminal node's heuristic
- Add new paths to the front of the agenda
- Sort agenda by path length so far
- Can also add a heuristic or extended set (or both)
- Branch and bound with heuristic and extended set
- Heuristic must be consistent
- Find shortest path between two nodes (branch and bound with extended set and without heuristic)
- Can't handle negative edge weights
- Using a Fibonacci heap, runtime is
O(|E|+|V|log|V|)
- Generate a minimum spanning tree of a graph
- Using a heap, runtime is
O(|E|+|V|log|V|)
- Compute shortest paths from a single source to all other nodes in the graph
- Can handle negative edge weights & detect negative-weight cycles
- Worst-case runtime is
O(|V||E|)
- Dynamic programming all-pairs shortest paths algorithm
dp(i,j,k+1)=min(dp(i,j,k),dp(i,k+1,k)+dp(k+1,j,k))
- Static/dynamic checking
- Strongly/weakly typed
- Compiled/interpreted
- Shallow/deep copying
- Immutable/mutable
- Greedy algorithms
- Defensive copying
- Pseudo-polynomial runtime
- Look here for formal definitions
- O - asymptotic upper bound
- o - asymptotic upper bound, excluding same rate
- Ω - asymptotic lower bound
- ω - asymptotic lower bound, excluding same rate
- Θ - same asymptotic growth
- Exponential > polynomial > logarithmic > constant
- Can ask for worst, best, or average case
Inspiration from here
- Abstract data type: access data only through a public interface; implementation can be changed without altering usage
- Class: basic concept in OOP, bundles data type information with actions
- Object: runtime value which belongs to a class
- Encapsulation: information hiding to ensure data integrity
- Hierarchy: classes can have super- and subclasses
- Inheritance: a subclass inherits data and methods from its parent classes
- Overriding: a subclass inherits parent methods but may override them
- Polymorphism: different classes in a program can respond to the same message in different ways; useful when an object's class can't be determined at compile time
- TODO
- Model-view-controller: model stores data, controller updates model, view generates user interface
- Factory method: use a factory object to create other objects rather than using a constructor
- Singleton: restrict instantiation of a class to a single object
- Observer: subjects notify observers of any state changes (usually by calling their methods); used in MVC
- Lots more
- A general method for solving a problem with optimal substructure by breaking it down into overlapping subproblems
- Top-down: memoize (store) solutions to subproblems and solve problem recursively
- Bottom-up: build up subproblems from base case up and avoid recursive overhead
- Order subproblems by topologically sorting DAG of dependencies
- Knapsack problem, longest common subsequence, coin change, edit distance, minimum number of jumps, longest palindrome substring, balanced partition
- GET: used to retrieve data, no other effect on the data
- POST: used to send data to the server (e.g. form)
- PUT: replaces current representation of resource (idempotent)
- DELETE: remove current representation resource
- 200 OK: success
- 400 Bad Request: syntax could not be understood
- 401 Unauthorized: request not fulfilled due to lack of authorization
- 403 Forbidden: request understood but not fulfilled, authorization will not help
- 404 Not Found: URI could not be matched
- 408 Request Timeout: server did not receive a timely response from client
- 500 Internal Server Error: server exception
- 503 Service Unavailable: server unable to handle the request (temporary)
- 504 Gateway Timeout: server did not receive a timely response from upstream server
- Master theorem: is most work performed in the root node, in the leaves, or evenly distributed in the rows of the recursion tree?
- TODO
init
: creates/initializes.git
folder in current directoryclone
: clone repo into new directorypull
: fetch from another repo and integrategit pull
is same asgit fetch
thengit merge FETCH_HEAD
add
: add files to index of contents for next commitrm
: remove files from working tree and indexcommit
: record changes to the repo, along with a commit messagerebase
: transplant changes on one branch to anotherbranch
: list, create, or delete branchescheckout
: switch branchesstatus
: show the working tree's statusdiff
: show changes between commits or the working treelog
: show commit logs in a reporemote
: manage tracked remote reposreset
: reset current HEAD to a different state
n(n-1)/2
: number of handshakes in a groupn-1
: number of rounds in a knockout tournament2^k
: number of binary strings of lengthk
n!/(n-r)!
: permutations ofn
items takenk
at a timen!/(r!(n-r)!)
: combinations ofn
items takenk
at a time
Lots of these taken from this blog.
-
Fibonacci sequence: print the
n
th Fibonacci number- Optimally, do this recursively and cache the subproblem solutions
-
Array pair sums: given an array, output pairs which sum to a number
k
- Can do in
O(n)
with a set data structure. For each element in the array, check to see ifk-a[i]
is in the set, then add the element to a set.
- Can do in
-
Reverse a linked list: reverse a singly linked list
- Track previous and current nodes; iterate through list and swap the direction of pointers. Time is
O(n)
and space isO(1)
.
- Track previous and current nodes; iterate through list and swap the direction of pointers. Time is
-
Matrix region sum: given multiple rectangular regions in a matrix, compute the sum of numbers in that region
- Memoize sums of regions with the constraint that corners are at
m[0][0]
- Memoize sums of regions with the constraint that corners are at
-
Word permutation: find all permutations of a word
```python def permute(word): if len(word) == 1: return {word} else: result = set() permutations = permute(word[:-1]) letter = word[-1] for p in permutations: result.update([p[0:i]+letter+p[i:] for i in range(0,len(word)+1)]) return result ```
-
Median of number stream: given a continuous stream of numbers, find the median of numbers so far at any time
- Optimally, keep a max-heap of the smaller half of the numbers and a min-heap of the larger half of the numbers
-
Infinite array search: given a sorted, infinite-length array, find a given value
- Modify binary search to start at the array's first element and exponentially increase the index you search at. Time is
O(log n)
- Modify binary search to start at the array's first element and exponentially increase the index you search at. Time is
-
Anagram pair: determine if two words are anagrams
- Comparison sort: sort the words in alphabetical order and check for equality.
O(n log n)
, wheren
is word length. - Count letters: use a hash table to track counts of letters in both words.
O(n)
runtime.
- Comparison sort: sort the words in alphabetical order and check for equality.
-
Anagram dictionary: determine which words in a list are anagrams of a given word
- Check for the membership of every permutation of the input word in the dictionary
-
Anagram list: determine which sets of words in a dictionary are anagrams
- Abstractly, hash each word and group by word. A hash can be a 26-digit string, or you can sort each word.
-
Binary search tree verification: verify whether a tree satisfies the binary search tree property
- For each node, track its possible minimum and maximum values
- Performing an inorder traversal should produce a sorted list
-
Largest continuous sum: in an array of integers, determine the subsequence with the largest sum
- Track maximum sum encountered so far and check whether current sum is greater. Reset current sum when it becomes negative. Time is
O(n)
and space isO(1)
.
- Track maximum sum encountered so far and check whether current sum is greater. Reset current sum when it becomes negative. Time is
-
-1/0/1 array: given an array where values are -1, 0, or 1, sort the array
- Bucket sort (but this takes
O(n)
space) - Iterate through the list and track pointers for min and max index. If a value is -1, swap it with the element at the min index and increment min index. If a value is 1, swap it with the element at max index and decrement max index. Time is
O(n)
and space isO(1)
.
- Bucket sort (but this takes
-
k-th largest element: find the
k
th largest element in an unsorted array- Modify quicksort to recursively sort on pivots in left/right subarrays (average
O(n)
, worst-caseO(n^2)
) - Median of medians algorithm
- Modify quicksort to recursively sort on pivots in left/right subarrays (average
-
Find missing number: given an array where every number except one appears an even number of times, find the number that appears an odd number of times
- Optimally, bitwise XOR by numbers in the list (XORing an even number of times resets the number to its original value). Time is
O(n)
and space isO(1)
- Optimally, bitwise XOR by numbers in the list (XORing an even number of times resets the number to its original value). Time is
-
Knapsack: given a set of items each with weights and values, maximize value while keeping total weight under a limit
- Dynamic programming: say weight limit is
W
. Create an arraym[w]
where each element is the maximum value of items with a weight limitw≤W
. Optimize by dividing item weights and weight limit by their greatest common divisor. RuntimeO(nW)
.
- Dynamic programming: say weight limit is
-
Balanced partition: given a set of numbers, partition them so that the sums of the partitions are as close as possible
- Greedy method: iterate through sorted list and add items to the smaller-sum partition
- Dynamic programming: determine if a subset of the input sums to
n/2
(wheren
is the sum of the input numbers)
s.center(w,[fillchar])
: returns centered string in string of widthw
s.count(sub[,start[,end]])
: returns count of non-overlapping occurences of substringsub in s
: returnsTrue
ifsub
is ins
s.find(sub[,start[,end]])
: returns start index of substring or-1
s.join(iter)
: join items in iterable, separated bys
s.strip([chars])
: removing leading and trailing characterss.replace(old,new[,count])
: returns copy ofs
withold
replaced bynew
l=[]
: initializelen(l)
: get sizel.append(val)
: append a valuel.insert(i,val)
: insert a value at positionl.extend(lst)
: append all values in a listl.pop([i])
: remove an item and return it (defaults to last item)x in l
: check membershipl.sort(cmp=None,key=None,reverse=False)
: sort in placesorted(iterable[, cmp[, key[, reverse]]])
: return a new stably sorted listl.reverse()
: reverse a list in placerange(start,end)
: get a list with items fromstart
(inclusive) toend
(exclusive)[<expr> for <var> in <list> if <condition>]
: list comprehensionlistname[start:end:slice_size]
: slicing
set()
or{l}
: initializelen(s)
: get cardinalityx in s
: check memberships.update(other)
: add values ofother
tos
s | other | ...
: return a union of setss & other & ...
: return an intersection of setss - other - ...
: return difference of setss ^ other
: return set of elements uniquely in sets
d={}
: initialized[key]
ord.get(key)
: get the value atkey
(the latter returnsNone
if not found)len(d)
: get item countkey in d
: check membershipd.pop(key)
: remove and return a value in the dictionarydel d[key]
: delete an itemd.update(other)
: update/overwrite with keys & values fromother
d.items()
: return a list of(key,value)
tuplesd.keys()
: return a list of dicionary keysd.values()
: return a list of dictionary values
- Binary numbers: preface with
0b
; usebin(int)
to convert- Left and right shift:
<<
and>>
- Bitwise AND, OR, XOR, NOT:
&
,|
,^
,~
- Bitmasks
- Left and right shift:
f = open('filename', 'w')
: open a filef.close()
: close filef.readline()
: read a line from the filefor line in f
: iterate through lines in filef.write()
: write a string to the file
x << y
: left shift byy
bitsx >> y
: right shift byy
bitsx & y
: bitwise ANDx | y
: bitwise ORx ^ y
: bitwise XOR~x
: complement ofx
__init__(self,[...])
: initializer for a class__cmp__(self,other)
: return negative for<
, 0 for==
, positive for>
__eq__(self,other)
: define behavior for==
- Also
ne
,lt
,le
,gt
,ge
- Also
__str__(self)
: return string representation__repr__(self)
: return machine-readable representation__hash__(self)
: return an integer such thata==b
implieshash(a)==hash(b)
copy
copy.copy(x)
: return shallow copy ofx
copy.deepcopy(x)
: return deep copy ofx
collections
(usecollections.deque
)dq.pop()
,dq.popleft()
,dq.appendleft(val)
,dq.extendleft(lst)
,dq.rotate(n)
heapq
heapq.push(heap,item)
: add an itemheapq.pop(heap)
: pop an itemheapq.heapify(l)
: make a list into a heap in linear time
BeautifulSoup
scipy
numpy
scikit-learn
nltk
requests
unirest
pdb
pdb.set_trace()
sets a breakpoint at the current line and gives the user a CLI with which to inspect various objects and their values at runtime. Also allows you to continue code execution line by line. Usehelp
to see a list of the commands usable in the CLI.
pprint
: Pretty Printpprint.pprint(iter)
: Print out a version ofiter
with JSON-like formatting. Useful for inspecting large, deeply nested objects.
zip(seq1 [,seq2 [...]])
: return list of tuples where each tuple contains the i-th element from each sequence. Truncated to length of shortest sequence ([(seq1[0], seq2[0] ...), (...)]
)map(f, seq)
: return list of the results off
applied to each element ofseq
([f(seq[0]), f(seq[1]), ...]
)filter(f, seq)
: return list of items inseq
for whichf(seq[i]) == True
reduce(f, seq)
: applyf
to pairs of elements inseq
until iterable is a single value
- Infinity:
float("inf")
- Simultaneous assignment:
a,b = b,a
to swap lambda x: <body>
: lambda function; don't need return statement- Tuples are immutable lists
zip()
- Four numeric types:
int
,long
,float
,complex
- Falsey values:
None
False
- Zero of any numeric type
- Empty sequences & mappings
- When
__nonzero__()
returnsFalse
- When
__len__()
returns zero for a user-defined class