/supergraph

A generic software for the management and processing of networked collections of graphs

Primary LanguagePythonMIT LicenseMIT

Supergraph

Supergraph is a generic software for the management and processing of a interrelated collection of multiple graphs.

It can be used to process multiple graphs. Functionality include (but it is not limited to):

  1. Similarity graphs: generated from node attributes, based on different similarity measures (Jensen-Shannon, Hellinger, L1, L2).
    • General implementations based on the neighbors module from scikit-learn.
    • Specific implementation for fast computation of Hellinger distances using Numba and cuda.
  2. Community detection algorithms (Louvain, Walktrap, FastGreeedy, Label Propagation)
  3. Bipartite graphs from attributes
  4. Transductive graphs: Graphs generated by connecting target nodes from a bipartite graph. Link weights are computed from the links of a graph connecting the source nodes.
  5. Transitive graphs, computed as the composition of two bipartite graphs.
  6. Analysis of graph partitions.
  7. Analysis of graph nodes (centrality measures, PageRank).
  8. Edicion tools for the collection of graphs:
    • Create, add, remove graphs
    • Subsampling
    • Reduction to graphs of equivalence classes
  9. Tools for visualization:
    • Graph layout algorithms.
    • Exportation to GEXF format
    • Visualization of bipartite graphs (requires Halo, not included)

Usage:

As an application:

The software includes two applications that can be used to generate and manipulate graphs through an interactive menu:

  • mainRDIgraphs.py: Provides accces to the sofware functionality through an interative menu. It reads the links to the source data from a configuration file (parameters.yaml). You would need to edit this file to use other data.
  • mainRDIlab.py: It uses the software functionality to carry out experiments for analysing RDI corpus collections.

Write

python mainRDIgraphs.py --h
python mainRDIlab.py --h

to see the available options.

As a sofware package:

The software include several class packages that can be used independently. Classes include (and are not limited to):

  • SimGraph: Generation of similarity graphs
  • CommunityPlus: Wrapper to community detection algorithms
  • DataGraph (requires SimGraph and CommunityPlus): provides tools for graph processing and analysis.
  • SuperGraph (requires DataGraph): provides tools for handling collections of DataGraph objects, including tools for the generation of new datagraphs.

Additional information

You can find more detailed information about this software in the Wiki.

This project was initially conceived for the processing of multiple corpus of scientific publications, patents and project proposals, inside the project "Service for Identifying Impact and R&D&I Agent Collaboration Networks" (Servicio para Identificar Impacto y Redes de Colaboración de Agentes I+D+i), funded by the Secretary of State for the Digital Agenda (SEAD, Secretaría de Estado para la Agenda Digital), under the umbrella of the Spanish Plan for the Stimulus of Language Tecnologies (PTL, Plan de Impulso de las Tecnologías del Lenguaje).