Graph theory, sometimes we call it complex network or network science or network analysis, is one of the most avant-garde research areas in discrete mathematics, also one of my favorite subjects. Here, “graph” is the preferred name, because too many people associate the word “network” with internet. Given the bonanza of data science, graph theory is constantly overshadowed by the hype of machine learning. Yet some of the top-notch technology companies such as Google and Facebook heavily rely on the research of graph theory.
This repository intends to increase the exposure of graph theory to all my readers. It contains common graph algorithms, popular network models, interesting agent-based simulations and amazing complex systems. The codes range from the level of elementary to sophisticated with wide applications in ecology, epidemiology, sociology, economics, finance, etc. Both Julia and Python are used to construct different scripts. As I am slowly climbing up the learning curve, more and more fascinating content will emerge en masse. Stay tuned!
Walking out of a maze seems extremely complex in recursion algorithm. In graph theory, it is just a walk in the park. The only bit that takes some effort is building the data structure. Once we got graph ADT, we can use BFS/DFS/Dijkstra/A* to find the way out of the maze, although DFS may not be able to point out the shortest route.
Click here to be redirected to the script.
Imagine a telecommunications company Orange tries to create a cable network connecting every family in Côte d’Azur. There are possibly millions of different routes to build such a network. Yet, the company only wants to satisfy every customer with the minimum cost. This creates a typical minimum spanning tree problem. Minimum spanning tree is a subset of a graph to connect all vertices without any cycle to achieve the minimum total weight. In this chapter, we will demonstrate how Borůvka’s algorithm, Kruskal’s algorithm and Prim’s algorithm obtain such a topological output.
Click here to be redirected to the script.
Whenever we hear the phrase 'shortest path', we should instinctively shout out Dijkstra's algorithm in our mind. This Dutchman came up with the revolutionary algorithm named after himself while searching for the shortest path to Rotterdam without pencil and paper. The algorithm is a special case of A* algorithm without heuristic function. It is also widely used as a subroutine for other traversal algorithms, such as Bellman-Ford. Apart from Dijkstra, the shortest path problem, which could be thought as an optimization problem, can also be solved in dynamic programming.
Click here to be redirected to the script.
Water jug is a simple and interesting problem. Assuming there are two jugs, a m-gallon and a n-gallon. And the target is to get x gallon water in either jug. There are many ways to solve this problem. Recursion can also solve water jug problem but it is very costly. Building a graph data structure would be a much easier way. The key is to set up edges based upon the rule of the game to connect all possible scenarios together. With BFS starts from the initial status, it can efficiently find the target status and return the route for us which would be the answer.
Click here to be redirected to the script.
Knight's tour is a game where a knight travels all the squares on a k by k chessboard. And each square can only be travelled once. The solution can be obtained by DFS with a list that records whether each square on the chessboard has been visited. The rules of a knight's movement are very different from pawns or king. The edges of the graph should indicate the valid move from one square to another.
Click here to be redirected to the script.
Missionaries and cannibals is a classic river problem. I actually prefer its alternative version called jealous husband. There are k missionaries and k cannibals. They need to cross a river. There is a boat with the capacity of only two people. There should be at least one person to sail the boat. These rules are subjected to one constraint. Cannibals cannot outnumber missionaries on both sides of the river. Again, the solution is similar to water jug problem. We have to set up edges based upon the rule of the game to connect all possible and valid scenarios together. As usual, BFS always return the optimized result rather than DFS.
Click here to be redirected to the script.
Finding the shortest path on a map is not the only problem Dijkstra can solve. Financial market is another application. As we all know, currency pairs have both bid and ask price. The market is weak-efficient. There could be an opportunity for triangle arbitrage or rectangle arbitrage. We can exchange from currency A to currency B then to currency C if we can churn out more profit rather than directly exchanging currency A to currency C. We set each currency as the vertex and the exchange rate from one currency to another as the weight of the edge. The arbitrage with some logarithm transformation would be the criteria for Dijkstra. However, Dijkstra cannot properly handle a graph structure with negative weights. We will introduce an improved version called Bellman-Ford which is able to detect the negative cycle during the traversal.
Click here to be redirected to the script.
Word ladder problem is a game developed by the author of Alice in Wonderland. Given one word, we try to change it into another word. Each time we can only change one letter and it should also be a valid word. This is a great test for vocabulary. The difficult part of building a graph structure is to build edges. Connect one word to another with only one letter changed takes a bit of work. BFS is the perfect solution for this.
Click here to be redirected to the script.
Is machine learning the best solution to text mining? What if graph theory beats it in both time and space complexity?
The whole project is designed for MENA Newsletter. The idea is to use graph structure traversal algorithm to remove similar contents and extract key information from the metadata of text.
For more details, please refer to the read me page of a separate directory or graph theory section on my personal blog.
K core, also known as k degenerate, is a subset of the original graph in which all vertices have degree at least k. By some definition, k core is required to be a connected graph. The standard algorithm to find a k core graph is to remove all the vertices that have degree less than k. We must be careful that removing a vertex reduces the degree of all the vertices adjacent to it, hence the degree of adjacent vertices can also drop below k. And thus, we may have to remove those vertices as well.
Click here to be redirected to the script.
A clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. A maximal clique is a clique that cannot be extended by including one more adjacent vertex, meaning it is not a subset of a larger clique. Think of maximal clique as a maximum social group where everybody knows each other. Finding a maximal clique is an NP-complete problem. Bron–Kerbosch algorithm with degeneracy ordering is the most efficient way to find out all maximal cliques. Its time complexity can be optimized to O(3**(n/3)).
Click here to be redirected to the script.
Amid the outbreak of the novel corona virus, we have observed a bonanza of agent-based simulation in epidemiology. By leveraging the probability generating function of a random graph, whether it is Barabási-Albert following power-law distribution or Erdős–Rényi following Poisson distribution, we are able to consume the least computing power to estimate the prevalence and the duration of COVID-19.
For more details, please refer to the read me page of a separate directory or graph theory section on my personal blog.
Modern portfolio theory was introduced in 1952 by Nobel laureate Harry Markowitz. It is part of investment class 101. But I watched a video by Wolfram recently. It challenged the traditional approach and introduced graph theory to asset diversification. There are plenty of quant shops deploying fancy mathematic tools to solve the market. The real question for us is, as fancy as it sounds, does graph theory work on portfolio optimization?
For more details, please refer to the read me page of a separate directory or graph theory section on my personal blog.
In the traditional study of community ecology, metapopulation model is used to map out the patchy habitat occupancy and extinction. It only requires one ordinary differential equation to monitor a single species. However, one of the biggest malaise is its landscape connectivity. With complex network, we can incorporate spatial structure into the deterministic system to create an agent-based simulation.
Click here to be redirected to the script. Please note this script is written in Julia.
Levins model has exceptional explanatory power for habitat fragmentation. Tilman model expands it to a more generalized form for multiple species. The stronger species can out-compete and displace weaker species. In this example, our agent-based simulation will illustrate the vis-à-vis among native species and invasive species. With the blessing of the spatial structure, some of the native species in remote area may be able to survive the invasion.
In the later chapter of this script, we will introduce Schelling’s model from sociology to investigate how the segregation is formed when multiple groups are presented in the system. Schelling’s model shows some unique traits such as maximal clique when implemented on a random geometric graph.
Click here to be redirected to the script. Please note this script is written in Julia.
In 2004, Lu Hong and Scott E. Page published a game-changing paper on diversity. They used agent-based model to prove that a diverse group of people is better than a group of best-performing individuals. Despite the high citations of this paper, plenty of scholars dispute the idea which makes the theory rather controversial. The caveat is the experiment of solving problems on lattice is too simple and straight forward whereas in real life the situation is more dynamic. Hence, the objective of this experiment is to convert lattice problem into the context of avant-garde random graphs. With the agent-based simulation on Waxman model, we are able to retain as many of the properties of a real life problem as possible and simplify the situation with respect to our ability to analyze the performance of different agents.
Click here to be redirected to the script. Please note this script is written in Julia.
Graph coloring is a color assignment problem on graph structures. The color could be assigned to the vertex or the edge. No adjacent vertex or edge should have the same color. The coloring may sound like an easy problem. To find an optimal coloring for vertices or edges is in fact NP-hard.
One of the most popular algorithms for vertex coloring is DSatur Algorithm by Daniel Brélaz. It beats greedy algorithm in random graph. The faux pas of DSatur is its inability to capture the optimal vertex coloring for all types of graph structures.
One of the most common and the easiest algorithms for edge coloring is Vizing's algorithm. It is a simple greedy algorithm to prove Vizing's theorem where the upper bound of the chromatic index equals to the maximum degree of the graph structure plus one. The malaise of Vizing is its inability to capture the optimal edge coloring for all types of graph structures.
Click here to be redirected to the script.
An independent set is a subset of a graph such that no two vertices inside are connected. A maximal independent set is an independent set that cannot include any extra vertices without violating the definition. A maximum independent set is one of the maximal independent sets with maximum cardinality.
There are tons of algorithms to find maximal independent set. Maximal independent set is an NP-complete problem for computer scientists. Currently the most popular one is randomized distributed algorithm.
Finding maximal independent set is NP-complete but finding maximum independent set is NP-hard. Intuitively, the easiest way to find a maximum independent set is a brute force algorithm to select maximum independent set from all the maximal independent sets. NetworkX
prefers Ramsey algorithm to approximate maximum independent set. Thus, it is worth the effort to make a little introduction of Ramsey algorithm.
Click here to be redirected to the script.
Sudoku is a combinatorial number-placement puzzle. Despite its Japanese name, the game was actually invented by an American and later gained popularity in Japan. There are many ways for computer algorithms to solve Sudoku. Here we use vertex coloring to solve the puzzle via maximum independent set. Vertex coloring guarantees no adjacent vertex should have the same color. As long as we can create a proper data structure for Sudoku puzzle, we are able to find the optimal vertex coloring scheme.
However, the optimal graph coloring is in fact an NP-hard problem. There are plenty of greedy algorithms but none really work very well on Sudoku. To avoid brute force, we tackle the optimal vertex coloring via maximum independent set. The maximum independent set is also an NP-hard problem with plenty of approximation algorithms not working well on Sudoku. So we apply a trick. The maximum independent set in a graph ADT is a maximum clique in a complement graph. It is much easier to search all the maximal cliques via Bron Kerbosch algorithm. We merely need to discover the maximal clique with maximum cardinality.
Click here to be redirected to the script.