Welcome to the algorithm club!
Here you'll find implementations of popular algorithms and data structures in everyone's favorite new language Swift, with detailed explanations of how they work.
If you're a computer science student who needs to learn this stuff for exams -- or if you're a self-taught programmer who wants to brush up on the theory behind your craft -- you've come to the right place.
The goal of this project is to explain how algorithms work. The focus is on clarity and readability of the code, not on making a reusable library that you can drop into your own projects. That said, most of the code should be ready for production use, but you may need to tweak it to fit into your own codebase.
All code is compatible with Xcode 7.2 and Swift 2.1.
This is a work in progress. More algorithms will be added soon. :-)
Suggestions and contributions are welcome! Report an issue to leave feedback, or submit a pull request.
To keep this a high quality repo, please follow this process when submitting your contribution:
- Create a pull request to "claim" an algorithm or data structure. Just so multiple people don't work on the same thing.
- Use this style guide for writing code (more or less).
- Write an explanation of how the algorithm works. Include plenty of examples for readers to follow along.
- Include your name in the explanation, something like Written by Your Name at the end of the document. If you wrote it, you deserve the credit and fame.
- Add a playground and/or unit tests.
Just so you know, I will probably edit your text and code for grammar etc, just to ensure a certain level of polish.
New algorithms and data structures are always welcome (even if they aren't on the list). Improvements to existing implementations. Better explanations. Suggestions for making the code more Swift-like or to make it fit better with the standard library. Unit tests. Fixes for typos. No contribution is too small. :-)
If you're new to algorithms and data structures, here are a few good ones to start out with:
- Stack
- Queue
- [Insertion Sort](Insertion Sort/)
- [Binary Search](Binary Search/)
- Binary Tree
- Merge Sort
- Boyer-Moore string search
- [Linear Search](Linear Search/). Find an element in an array.
- [Binary Search](Binary Search/). Quickly find elements in a sorted array.
- [Count Occurrences](Count Occurrences/). Count how often a value appears in an array.
- [Select Minimum / Maximum](Select Minimum Maximum). Find the minimum/maximum value in an array.
- Select k-th Largest Element
- Selection Sampling
- Union-Find
- Boyer-Moore. A fast method to search for substrings. It skips ahead based on a look-up table, to avoid looking at every character in the text.
- Rabin-Karp
It's fun to see how sorting algorithms work, but in practice you'll almost never have to provide your own sorting routines. Swift's own sort()
is more than up to the job. But if you're curious, read on...
Basic sorts:
- [Insertion Sort](Insertion Sort/)
- [Selection Sort](Selection Sort/)
- Shell Sort
Fast sorts:
- Quicksort
- Merge Sort
- Heap Sort
Special-purpose sorts:
- Bucket Sort
- Counting Sort
- Radix Sort
- Topological Sort
Bad sorting algorithms (don't use these!):
- [Bubble Sort](Bubble Sort/)
- Huffman Encoding
- Shuffle. Randomly rearranges the contents of an array.
- Greatest Common Divisor (GCD). Special bonus: the least common multiple.
- Permutations and Combinations. Get your combinatorics on!
- Statistics
- k-Nearest Neighbors
The choice of data structure for a particular task depends on a few things.
First, there is the shape of your data and the kinds of operations that you'll need to perform on it. If you want to look up objects by a key you need some kind of dictionary; if your data is hierarchical in nature you want a tree structure of some sort; if your data is sequential you want a stack or queue.
Second, it matters what particular operations you'll be performing most, as certain data structures are optimized for certain actions. For example, if you often need to find the most important object in a queue, then a heap or priority queue is more optimal than a plain array.
Often just using the built-in Array
, Dictionary
, and Set
types is sufficient, but sometimes you may want something more fancy...
- Array2D. A two-dimensional array with fixed dimensions. Useful for board games.
- [Fixed Size Array](Fixed Size Array/). When you know beforehand how large your data will be, it might be more efficient to use an array with a fixed size.
- Ordered Array. An array that is always sorted.
- Stack. Last-in, first-out!
- Queue. First-in, first-out!
- Deque
- Priority Queue
- [Ring Buffer](Ring Buffer/). Also known as a circular buffer. An array of a certain size that conceptually wraps around back to the beginning.
- [Linked List](Linked List/). A sequence of data items connected through links. Covers both singly and doubly linked lists.
- Tree (general-purpose)
- Binary Tree
- Binary Search Tree (BST)
- AVL Tree
- Red-Black Tree
- Splay Tree
- Threaded Binary Tree
- kd-Tree
- Heap. A binary tree stored in an array, so it doesn't use pointers. Makes a great priority queue.
- Fibonacci Heap
- Trie
- Bit Set
- Bloom Filter
- [Hash Set](Hash Set/). A set implemented using a hash table.
- Multiset
- Ordered Set
- [Hash Table](Hash Table/). Allows you to store and retrieve objects by a key. This is how the dictionary type is usually implemented.
- Hash Functions
- Graph
- Breadth-First Search (BFS)
- Depth-First Search (DFS)
- Shortest Path
- Minimum Spanning Tree
- All Paths
A lot of software developer interview questions consist of algorithmic puzzles. Here is a small selection of fun ones. For more puzzles (with answers), see here and here.
- [Two-Sum Problem](Two-Sum Problem/)
It's useful to know how fast an algorithm is and how much space it needs. This allows you to pick the right algorithm for the job.
Big-O notation gives you a rough indication of the running time of an algorithm and the amount of memory it uses. When someone says, "This algorithm has worst-case running time of O(n^2) and uses O(n) space," they mean it's kinda slow but doesn't need lots of extra memory.
Figuring out the Big-O of an algorithm is usually done through mathematical analysis. We're skipping the math here, but it's useful to know what the different values mean, so here's a handy table. n refers to the number of data items that you're processing. For example, when sorting an array of 100 items, n = 100.
Big-O | Name | Description |
---|---|---|
O(1) | constant | This is the best. The algorithm always takes the same amount of time, regardless of how much data there is. Example: looking up an element of an array by its index. |
O(log n) | logarithmic | Pretty great. These kinds of algorithms halve the amount of data with each iteration. If you have 100 items, it takes about 7 steps to find the answer. With 1,000 items, it takes 10 steps. And 1,000,000 items only take 20 steps. This is super fast even for large amounts of data. Example: binary search. |
O(n) | linear | Good performance. If you have 100 items, this does 100 units of work. Doubling the number of items to 200 makes the algorithm take twice as long (200 units of work). Example: sequential search. |
O(n log n) | "linearithmic" | Decent performance. This is slightly worse than linear but not too bad. Example: the fastest sorting algorithms. |
O(n^2) | quadratic | Kinda slow. If you have 100 items, this does 100^2 = 10,000 units of work. Doubling the number of items makes it four times slower (because 2 squared equals 4). Example: algorithms using nested loops, such as insertion sort. |
O(n^3) | cubic | Poor performance. If you have 100 items, this does 100^3 = 1,000,000 units of work. Doubling the input size makes it eight times slower. Example: matrix multiplication. |
O(2^n) | exponential | Very poor performance. You want to avoid these kinds of algorithms, but sometimes you have no choice. Example: traveling salesperson problem. |
O(n!) | factorial | Intolerably slow. It literally takes a million years to do anything. |
Often you don't need math to figure out what the Big-O of an algorithm is but you can simply use your intuition. If your code uses a single loop that looks at all n elements of your input, the algorithm is O(n). If the code has two nested loops, it is O(n^2). Three nested loops gives O(n^3), and so on.
Note that Big-O notation is an estimate and is only really useful for large values of n. For example, the worst-case running time for the "insertion sort" algorithm is O(n^2). In theory that is worse than the running time for "merge sort", which is O(n log n). But for small amounts of data, insertion sort is actually faster, especially if the array is partially sorted already!
If you find this confusing, don't let this Big-O stuff bother you too much. It's mostly useful when comparing two algorithms to figure out which one is better. But in the end, you still want to test in practice which one really is the best. And if the amount of data is relatively small, then even a slow algorithm will be fast enough for practical use.
For more information, check out these great books:
- Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein
- The Algorithm Design Manual by Skiena
- Elements of Programming Interviews by Aziz, Lee, Prakash
- Algorithms by Sedgewick
The following books are available for free online:
- Algorithms by Dasgupta, Papadimitriou, Vazirani
- Algorithms, Etc. by Erickson
- Algorithms + Data Structures = Programs by Wirth
- Algorithms and Data Structures: The Basic Toolbox by Mehlhorn and Sanders
All content is licensed under the terms of the MIT open source license.