Graph-Data-structure

Algorithmic Methods of Data Mining: by Emanuele Gugliandolo, Salwa Afrah, Octavianus Surya Putra Sinaga

Homework 5 - Explore California and Nevada with graphs

Alt Text

In this homework, you will build a system that provides users with information about roads in California and Nevada. Specifically, the implementation of the system consists of two parts.

  • Backend: where you need to develop efficient algorithms that define the functionalities of the system
  • Frontend: where you provide visualization for queries entered by the user

IMPORTANT: In order to deal with visualization of graphs you can freely use libraries such as networkx or any other tool you choose, but when you are writing algorithm, they have to be implemented by yourself using proper data structures, without any library that computes some steps of the algorithm for you.

1. Data collection

The first step, as always, is to download the data you will be working on. You can download the data to build the system here. Download the files for the dataset named CAL.

In particular, you will get the following three files containing the below-mentioned information:

  • Distance graph - file containing physical distances between each pair of nodes. Each line follows this structure: (Id_Node1, Id_Node2, d(Id_Node1,Id_Node2)), where d(x,y) is the physical distance between x and y.
  • Travel time graph - file containing time distances between each pair of nodes. Each line follows this structure: (Node1, Node2, t(Id_Node1, Id_Node2)), where t(x,y) is the time distance between x and y.
  • Node information file - file containing node coordinates. Each line follows this structure: (Id_Node, Latitude, Longitude)

2. Implementation of the backend

The goal of this part is the implementation of a unique system that has four different functionalities. The program takes in input always a number i in [1,4]: given the input, the program has to run Functionality i, applied to the graph you create from the downloaded data.

Functionality 1 - Find the Neighbours!

It takes in input:

  • a node v
  • One of the following distances function: t(x,y), d(x,y) or network distance (i.e. consider all edges to have weight equal to 1).
  • a distance threshold d

Implement an algorithm (using proper data structures) that returns the set of nodes at distance <= d from v, corresponding to v’s neighborhood.

P.s. We are not referring to the graph's definition of neighbours, but we are asking you to find all the nodes at distance <= d from v and not only the ones directly connected to it.

Functionality 2 - Find the smartest Network!

It takes in input:

  • a set of nodes v = {v_1, ..., v_n}
  • One of the following distances function: t(x,y), d(x,y) or network distance (i.e. consider all edges to have weight equal to 1).

Implement an algorithm that returns the set of roads (edges) that enable the user to visit all the places. We want this set to be the ones whose sum of distances is minimum.

As a dummy example, a set of input could be {Colosseo, Piazza Venezia, Piazza del Popolo} and therefore the associated set of streets will be {Via dei Fori Imperiali, Via del Corso}.

Functionality 3 - Shortest Ordered Route

It takes in input:

  • a node H
  • A sequence of nodes p = [p_1, ..., p_n]
  • One of the following distances function: t(x,y), d(x,y) or network distance (i.e. consider all edges to have weight equal to 1).

Implement an algorithm that returns the shortest walk that goes from H to p_n, and that visits in order the nodes in p.

Consider that:

  • The algorithm needs to handle the case that the graph is not connected, thus not all the nodes in p are reachable from H. In such scenario, it is enough to let the program give in output the string "Not possible".
  • Since we are dealing with walks, you can pass more than once on the same node p_i, but you have to preserve order. E.g.: if you pass through p_2 and you are going to p_3, you can pass through p_10, but once you will be in p_9, you will have to go back to p_10 as well.

Functionality 4 - Shortest Route

It takes in input:

  • a node H
  • A set of nodes p = {p_1, ..., p_n}
  • One of the following distances function: t(x,y), d(x,y) or network distance (i.e. consider all edges to have weight equal to 1).

Implement an algorithm that returns the shortest walk that goes from H to p_n, and that visits the nodes in p.

Consider that:

  • The algorithm needs to handle the case that the graph is not connected, thus not all the nodes in p are reachable from H. In such scenario, it is enough to let the program give in output the string "Not possible".
  • Since we are dealing with walks, you can pass more than once on the same node p_i.
  • Since the problem’s complexity is high, consider to provide just an approximation/heuristic solution for the problem.

3. Implementation of the frontend

In this section, you build the visualizations for users’ queries results.

Visualization 1 - Visualize the Neighbours!

Once the user runs Functionality 1, we want the system to show in output a complete map that contains: the input node, the output nodes and all the streets that connect these points. Choose different colors in order to highlight which is the input node, and which are the output nodes. Furthermore, choose different colors for edges, according to the distance function used.

Visualization 2 - Visualize the smartest Network!

Once the user runs Functionality 2, we want the system to show in output a complete map that contains: the input nodes and all the edges that connect them. Choose different colors in order to highlight which are the output edges. Furthermore, choose different colors for edges, according to the distance function used.

Visualization 3 - Visualize the Shortest Ordered Route

Once the user runs Functionality 3, we want the system to show in output the Shortest Ordered Route.

Visualization 4 - Visualize the Shortest Route

Once the user runs Functionality 4, we want the system to show in output the Shortest Route.

For each of the visualization, you can add more 'fancy' stuff. Therefore, you can go deeper, adding more features, and making the visualization even more detailed! But, the important thing is that there are at least the requested features.

Good luck!