Our problem is betweenness centrality mapping. Our goal is to derive the centrality of all the airports, and generates the heat map in the PNG format. From the graph, we can tell the centrality of the airport. Based on the output, We can make comment on the importance of airports in the world.
We will used the dataset from https://openflights.org/data.html.
The original data format of this dataset is .dat. For each row, the col represents the information related to this airport, like airline ID, country, etc... The size of this dataset is ~400KB. Currently, we plan to only use the airline and airport information, which is the subset of the whole dataset.
We will use Python to pre-processing the data which removes all the unused col, repetitive row, and rows without required information (airport or airline information). We will write some test cases that checks the value inside certain col is not NULL. Also, the numerical value is within some tolerance range.
We will use graph to store our data. Since each node will have 2 data segments, the graph will take total of
We will be using this graph to store all the data. Each node inside the graph will be represented as an airport. The edge between the node will be represent as an airline. The reason for using the directed is that there might be some one-way only airlines. Because we want generate more realistic graph, so we have to use directed graph. The runtime of this algorithm is
After exploring a lot of shortest path algorithms on the website, we believe that the Floyd–Warshall algorithm is more efficient and relative easier for us to implement. We will use this algorithm to find shortest paths between 2 nodes, and then update the weight of the edge between these nodes. The runtime of this algorithm is
The equation we are going to used is:
The
We will be using the PNG classes from the CS 225 MP to generate the picture output of our graph. The runtime of this algorithm is
simply execute the code below and waiting for executing. After the program end processing, it will generate a file result.csv
in base fold which contain the betweenness centrality for all vertexes:
make calc_bc
after generating the result.csv
file, you can visualize the bc value on map by running:
make
11/18 finished the data cleaning
11/18 complete 80% of the graph implmentation
11/20 completed BFS of the graph
11/21 complete the Floyd–Warshall algorithm
11/22 merge everything and start the update weight method
11/23 convert the graph to PNG output