epfl-lts2/pygsp

Import & export

mdeff opened this issue · 5 comments

mdeff commented

Goals:

  • Import from and export to standard Python packages for network science. At least networkx and graph-tool. Optionally, NetworKit, igraph, SNAP.
  • Use the above import & export functionality to load & save graphs and signals in standard formats such as:
  • Further leverage the import & export infrastructure. For example, we may want to use networkx's graph generators, or visualization tools.

@cgallay will implement those features. Any feedback or comment from the community is welcome.

mdeff commented

API

I'd propose to introduce FromNetworkX and FromGraphTool as sub-classes of Graph, such that a pygsp graph can be constructed from a networkx graph as graph = pygsp.graphs.FromNetworkX(graph_nx). The to_networkx and to_graphtool methods of a Graph object would then return a graph object of the respective package. That is, graph_gt = graph.to_graphtool().

For the second step, another sub-class, FromFile, will accept a path to a file and use networkx or graph-tool in the backend to load the graph. That is, graph = pygsp.graphs.FromFile('path/to/graph.format'). A graph will be saved as graph.save('path/to/graph.format').

In addition to the graph, we should import/export one or many signals. Signals are sometimes part of a graph, and it's useful to save them together (e.g., for visualization). In networkx, signals can be represented as node attributes (weights are edge attributes). In graph-tool, signals can be represented as vertex property maps (weights are edge property maps). In both implementation, signals can be named. I'd thus propose to introduce a Graph.signals attribute, which is initialized as an empty dictionary by default. Users can either populate the dictionary before exporting/saving, or retrieve the saved signals by name when importing/loading.

Considered options

While it's clear that export and save should be implemented with methods, I was not sure if the import and load should be implemented in the constructor or as a factory. Options:

  1. graph = pygsp.graphs.Graph('/path/to/graph.format'): constructor accepts multiple objects, such as paths, numpy array (adjacency matrix), networkx and graph-tool graphs. That is convenient but the documentation might not be clear as Python lacks function overloading.
  2. graph = pygsp.graphs.Graph.load('/path/to/graph.format'): a @classmethod factory is used to build and return the object as the __init__ constructor would do. Each factory builds a Graph from a different object.
  3. graph = pygsp.graphs.load('/path/to/graph.format'): the factory is a module-level function instead of a class method. While that is more convenient for the user, it lacks a clear meaning that load will return a Graph object.
  4. graph = pygsp.graphs.FromFile('/path/to/graph.format'): sub-class the Graph base class to overload the constructor. That's more consistent as we do it this way for graph models already. It's also more OOP.

Another option was for the importing/loading functions to return a dictionary of signals instead of storing it as the graph.signals attribute. Exporting would have been done as graph.to_networkx(myfeature1=mysignal1, myfeature2=mysignal2). If those signals are to be exported and saved, it's however clearer if they are part of the Graph object.

Implementation

Networkx and graph-tool should be lazily imported as we don't want them as dependencies. That is, import them in the function definition not the module.

Unit tests should be created to test the implementation. I propose to do a round-trip (to networkx, graph-tool, various graph formats) and verify that the graph and signals are preserved.

Again, any suggestion is welcome.

For FromNetworkX and FromGraphTool why do we need to do sub-classes? Could we go for simpler @classmethod factory instead? That we could call like that :
graph = pygsp.graphs.Graph.from_networkx(graph_nx)

mdeff commented

We of course could. That's what I discussed in the "Considered options" above. The main argument for me to go with sub-classes is to be consistent. Given that we already have pygsp.graphs.Sensor() and the like for graph models, pygsp.graphs.FromNetworkX(graph_nx) is more consistent than pygsp.graphs.Graph.from_networkx(graph_nx). In the end all the sub-classes of Graph only exists to overload the constructor (be it to take it from networkx, to load it from a file, to generate a network from a model, etc.). All the methods (and therefore the interface), is implemented in Graph itself (or parent classes such as GraphFourier and GraphDifference, which are only mint to modularize the code). See also my StackOverflow answer.

Here are some implementation details.
For Graph Tool:

  • The weight of the edges are stored as an edge property therefore for export, the method to_graphtool(...) will take an argument edge_property_name that will be used as a key for the edge property of the generated graphtool object. When importing a graph from graphtool the same argument will be used and the corresponding property will be used as a weight for the edges

  • Multiple edges are allowed in graphtool but not in pygsp therefore when importing a graph containing more than one edge between two vertexes I suggest that a warning is shown to the user letting him know that the sum was used to aggregate the multiple vertexes into one. For more flexibility, the user will have the possibility to set a custom aggregate function with the agg_mutli_edge_fn argument.

For Networkx:

  • a user can set some attributes per edge. By default when constructed by from_scipy_sparse_matrix the attribute is attribute name is weight. For export, the method to_networkx(...) will take an argument edge_property_name (for consistency with graphtool) where the weight of the edges will be stored. When importing the user will have to choose which attribute he wants to use as weight (see)

Please let me know if you like the implementation or give me feedback on how to improve. I will update this post with more implementation details as I code.

mdeff commented

Implemented in #32 and #46.