Algorithms and Data Structures

1. Big O Asymptotic Analysis

What is good code?

Good code must be: readable + scalable (Speed + Memory)
Run time: How long it takes to solve a problem.
Big O is a measure of alogirthmic efficiency, how long an algorithm takes to run as the input increases.
Big O matters only in large scale and large inputs. Big O is interested in knowing how the end of the graph would look like on a large scale. In other terms, Big O measures the scalability of the code.
Big O measures the time complexity and space complexity of the code.

O(n) - Linear Time As the number of inputs increases, the number of operations increases proportionally.

Looping over every element in for and while loops for the same arry. If we are looping over two differnet arrays, the Big O is O(a+b).

O(1) - Constant Time As the number of inputs increases, the number of operations is always the same.

No loops.

O(n^2) - Quadratic Time As the number of inputs increases, the number of operations increases quadratically.

Every element in a collection needs to be compared to every other element in the same array.
Two nested loops for the same array.

O(n!) - Factorial Time As the number of inputs increases, the number of operations increases quadratically.

Adding a loop for every element.

Big O Rules:

Worst Case
Remove Constants
Different Terms for Inputs
Drop Non Dominants: O(n^2 + n) --> O(n^2)

Three Pillars of Programming:

Readable
Speed (Time Complexity) - Efficient Big O time complexity
Memory (Space Complexity) - Efficient Big O space complexity

What causes Time Complexity?

Operations ( +, -, *, / )
Comparisons
Looping
Outsidefunction calls

What causes Space Complexity?

Variables
Data Structures
Function Calls
Allocations

Space Complexity

When we talk about Space Complexity we talk about additional space, so we don't look at the size of the input. We look at how much space are we consuming inside the function.

2. How To Solve Coding Problems

Being the best coder doesn't mean you're going to ace the technical interview.
Companies look for people with analytical skills, coding skills, technical skills and communication skills.
Check out the Big O Cheat Sheet

3. Data Structures Introduction

What is a Data Structure?

Data Structures are collection of values, and these values can have relationships among them and functions related to them.
Each Data Structure is useful for its own thing.

How Computers Store Data?

Variables are stored in RAM.
RAMs are devided into addresses and each address has 1 byte (8 bits).
The CPU uses the Memory Controller to read/write on the RAM.
The CPU Cache holds data that are very very recent.
Integers are usually represented in 32 bits (4 addresses = 4 bytes/lines in RAM).
JavaScript Infinity is when we are unable to store the number in RAM.
Data Structures are arrangement of data in RAM. How data is arranged in RAM has pros and cons on read/write. Selecting the best data structure means making the CPU perform operations as fast as possible.

Operations on Data Structures

Insertion, Deletion, Traversal (one for loop), Searching (find index), Sorting, Access.

4. Arrays

Arrays store data sequentially.
Lookup, push, pop have O(1) because the computer knows where data are stored in RAM (sequentially).
Unshift, splice have O(n) because it contains iterations over its items to re-order them.
Arrays are not the best when we want to add elements to the beginning of the array.

Static Arrays vs. Dynamic Arrays:

Static Arrays are fixed in size. If you want to expand on their size, you need to copy and rebuild an array in a new location => double memory + time complexity O(n).
Dynamic arrays allow adding items without copying (most of the time) => O(1).
JavaScript arrays are dynamic - they automatically allocate memory according to increase in size. They expand as you add more items. But rarely do they act like static arrays. So the push command can be either O(n) or O(1).

O(1) Operations:
Lookup, push, pop

O(n) Operations:
Unshift, splice

Notes

Treat strings as arrays because a string is an array of characters.
Pros: fast lookup, push, and pop because they are ordered in memory.
Cons: slow inserts, slow deletes and fixed in size (static arrays)

5. Hash Tables

Hash Tables store key-value pairs where keys are passed to a hash function which in turn decides where in memory to store this pair based on its output.

Hash Function

Hash Function = a function that generates a value of fixed length of each input.
Hash Functions are one-way functions.
Small change in the input makes huge change in the output.
Hash Functions are idempotent: they generate the same output for the same input.
Every time you add or retrieve data from memory, the hash function will run

Hash Collisions

O(1) Operations:
insert, lookup, delete, search

When data is hashed there is a chance that it will be stored in the same memory space of another data, causing a collision. There are many ways to resolve a collision, one way is Linked Lists data structure.
When there is a collison occurs, the big O becomes O(n).

In JS the keys must be strings.
JS Map allows you to store any data type as keys. It also maintains insertion order.
JS Set store only keys.

Arrays vs Hash Tables

Arrays
Search O(n) because we have to loop over all items. Insert, delete O(n) because it might shift indexes. Lookup, push O(1) because we know where the index is in memory.

HashTables Pros & Cons

Pros: Fast lookup (Good collision resolution needed), fast inserts, flexible keys. Cons: Unordered, Slow key iteration.

Hash Tables can improve Time Complexity of nested loops but increases Space Complexity.

6. Linked Lists

Linked Lists have two types: singly and doubly.
Singly linked lists as shown below consist of nodes, where each node has two parts: one that holds the value to store and the other is a pointer to the value of the next node.
Singly Linked Lists are null-terminated.
The first node is called the head while the last node is called the tail.
Traversing is iterating over an iterable but we use that term because we don't know when we will hit null.
Traversing over a Linked List is slower than iterating an array because Array elements are stored sequentially in memory while in Linked Lists they are scattered even though both operations are O(n).
Inserting and deleting in Linked Lists and Hash Tables are better than in Arrays even if they both have O(n).
Linked Lists have order while Hash Tables don't.

O(1) Operations:
prepend, append

O(n) Operations (worst case):
insert, lookup, delete

What is a Pointer?

A pointer is a reference in another place in memory.
Objects in JS are pointers to a memory space. Once we delete the object using the delete keyword, JS will automatically remove the memory that was allocated to that object. That's why it's called a garbage collecting language.

Singly vs. Doubly Linked Lists

Singly: simpler to implement, faster, cannot be iterated backwards. Doubly: more complex, can be traversed from front and back, requires more memory.

7. Stacks and Queues

Stacks allow insert and remove operations and follow the last in, first out principle (LIFO).
Queues allow insert and remove operations and follow the first in, first out principle (FIFO).
Stacks and queues are linear data structures with ordered items, allowing us to traverse the data sequentially one by one, in which only one data element can be reached at a time.
Stacks and Queues are higher-level data structures built on lower -level data structures, like arrays and linked lists, with limited methods.
Examples of stacks: JavaScript engine, browser history, undo/redo operations.
Examples of stacks: Uber, printer, restaurant.

Stacks are dynamic arrays. => Lookup operations are O(n) T, O(1) S.
Inserting elements in Stacks (push operations) is O(1) ST.
Removing elements from Stacks (pop operations) is O(1) ST.
Peek operations, viewing the last item is O(1) ST.
Queues are represented as linked lists because dequeueing from a list is O(n).
Inserting elements in queues (enqueue operation) is O(1) ST because we are replacing the tail.
Removing elements from queues (dequeue operation) is O(1) ST because we are removing th head.
Peek operations, viewing the last item is O(1) ST.
Stacks and Queues have slow look-ups.

8. Trees

Trees have hierarchial structures. They might have sub-trees.
Root Node is the first node of the tree.
Parent Node has children nodes.
Leaf Nodes are the latest nodes of the parent.

Binary Tree: Each node can have either 0, 1 or 2 nodes. Each child can have one parent. Each node represents a certain state.
- The number of total nodes double on each level.
- The number of nodes in the last level equals the sum of nodes in other levels + 1.
- Half of our nodes are on the last level.
- Num of nodes in level x = 2^x where x represents the level.
- Num of nodes in a perfect tree = 2^h - 1.
- In binary trees, lookup, insert, delete all have O(log N) operations which means the number of steps to find the required node decreases with every step. Searching through the phone book is also O(log N). Perfect Binary Trees leaves have their parents with 2 leaf nodes. They are all full.
  Full Binary Tree have their parents with either 0 or 2 nodes but never one.
Binary Search Tree (BST): preserves relationship between nodes (like folders path).
- Nodes to the right increase in value. Nodes to the left decrease in value.
- A parent can have up to 2 children.
- Balanced BST allows O (log N) ST for lookups, insertion and deletion while unbalanced BST allows O(N) ST because they turn into linked lists.
- AVL Trees and Red/Black Trees rebalance the trees automatically.
Binary Heaps: Every child belongs to a parent which is higher in value (max heap) or lower in value (minimum heap).
Binary Heaps are great when doing comperative operation. Eg.: looking for people over 30 years old.
There is no concept of balanced/unbalanced Binary Heap.
Binary Heaps are good in Priority Queues e.g: VIP tickets have higher priority than normal tickets.

Insert elements is always LTR ==> memory efficient + O(1) T in best case or O(log N) T in worst case.
Lookup O(N).
Delete O(log N).

Trie (Pre-fixed Tree): Used in searching with texts. It can outperform other data structures in lookups. They are mainly used in auto-completion, auto-suggestion, IP routing.

The big O of a Trie is O(length of the word) T.

9. Graphs

Each item is called Node or Vertex.
Nodes are connected through edges.
Linked Lists are types of Trees, and Trees anre types of Graphs.
Graphs are good in complex relationships but they're hard to scale.

Directed Graphs: data flows in one way. Undirected Graphs: data flows in both ways.

Weighted Graphs: have numerical values attached to their edges. Weighted Graphs are used when deciding the fastest path from one node to another. Unweighted Graphs: have values attached only to their nodes.

Cyclic Graphs: common in weighted graphs. Acyclic Graphs

To implement a graph there are 3 ways:

Edge List: shows only connections between all the nodes.
Adjacent List: the index is the node's value. The value is the neighbors value.
Adjacent Matrix

10. Algorithms - Recursion

Recursion is a function that refers to itself inside that function. It is used when we have tasks with repeated sub-tasks to do (Eg. looking for all .mp3 files in our disk drive).
Recursion can cause Stack Overflow - reaching the maximum call stack size.
Recursion must have two things: a base case and a recursive case. Usually we have 2 returns for the base case and the recursive case.
Anything that can be solved by a recursion can also be solved with loops. Recursion makes code very readable but might be expensive on memory.

When to know that a recursive problem presents itself?

The problem can be divided into subproblems that are smaller instances of the same problem.
Each subproblem is identical in nature.
The solutions of each subproblem can be combined to solve the problem at hand.

Recursion is good in: Merge Sort, Quick Sort, Tree Traversal, Graph Traversal.

11. Algorithms - Sorting

A. Comparison Sorting

Bubble Sort: don't use it O(N^2) T.
Selection Sort: don't use it O(N^2) T.
Insertion Sort: this algorithm is good with small data sets and when data is almost sorted. Best case O(N) T, but in worst case it is O(N^2) T.
Merge Sort: it's always O(n log N) T so use it in worst cases. But it's expensive on memory O(N) S.
Quick Sort: we use it when the average case matters more than the worst case and it's better than Merge Sort in space complexity.

B. Non-Comparison Sorting (Radix Sort + Counting Sort):

They are useful only with integers and with limited range.

13. Searching (BFS, DFS)

Linear Search: Finding a target value within a list. Best case O(1). Worst case O(n)
Linear Search: Good for sorted items O(log n).
BFS:
DFS:

13. Algorithms - Dynamic Programming

Memoization is a form of caching the return value of a function based on its parameters whether they are changed or not.

mmaksi/algorithms-data-structures