Network alignment benchmark

Author: Huynh Thanh Trung et al.

Requirements

pytorch >0.2 is required (0.4 is recommended).

networkx 1.11

python 3

example scripts to generate dataset

Extract file dataspace.zip before running scripts

scripts/gen_semi_ppi.sh

scripts/gen_fully.sh

examples to run algorithms can be found in script/

data_utils:

  • count_node_same_features: Count number of edges which have same features.
  • edgelist_to_graphsage: Convert from edgelist to graphsage data (include G.json, id_map.json).
  • evaluate_distance: Evaluation embedding based on link prediction.
  • extend_anchor_link: Extend network.
  • feature_groundtruth_checking: Checking dict features whether corrected or not, input is source dataset and target dataset (shuffle source dataset).
  • feature_statistics: Print the value taken by sum all features of a network.
  • filter_dataset_by_dict:
  • filter_dataset_by_degree: Remove nodes which have degree < threshold.
  • gen_dict: Generate dictionaries with a split value, including train.dict, test.dict and full.dict.
  • generate_groundtruth: Input a txt file containing list of nodes, output a groundtruth file. This file is deprecated. Use full.dict generated by gen_dict, then rename this file to "groundtruth". (Deprecated).
  • get_sub_graph: Generate subgraphs of a network.
  • graphsage_to_edgelist: Convert data from graphsage format to edgelist format.
  • graphsage_to_mat: Convert data from graphsage format to .mat format.
  • mat_to_graphsage: Convert data from .mat format to graphsage format.
  • merge_graphs: Merge two graphs into one. Note that new groundtruth file is the groundtruth of source dataset with shuffled source dataset (this file oftens use with the input of source dataset and shuffled source dataset).
  • pale_facebook_preprocess: Filter nodes in pale_facebook dataset which have degree < threshold.
  • pale_random_clone: Random delete and add edges to a network based on the algorithm in IJCAI16's paper.
  • random_clone: Random add and delete add edges to a network.
  • random_clone_add: Random add edges to a network.
  • random_clone_delete: Random delete edges to a network.
  • shuffle_graph: Create new graph by shuffling a graph.
  • split_dict: Split groundtruth to train and val set.
  • split_embeddings: Split embeddings file to source and target embedding. This file oftens use with merge_graphs after training a merged graph.
  • synthetic_graph: Create a synthetic graph.
  • visualize_degree_distribution: Visualize nodes' degrees of a network.