/robustness

Evaluation and improvement of cascade diffusion robustness against node attacks.

Primary LanguageC++

Introduction

This repository provides with an experimental framework for testing robustness of diffusion under Independent Cascade model. The general outline of the framework:

  • Generate a network
  • (optional) Apply Influence Maximization
  • Apply Node Immunization (NI) algorithm to find best nodes to immunize
  • Apply Robust Influence Maximization or Influence Maximization (IM) to find best robust seeds
  • Evaluate objectives by Monte-Carlo simulations

Installation

File requirements.txt provides with a list of dependencies for the Conda packet manager.

Install Miniconda: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html

Create environment and install dependencies by

$ conda create --name ic_robustness --file requirements.txt

Next, build the SatGreedy solution by

$ mkdir src/SatGreedy/bin
$ cd src/SatGreedy/bin
$ cmake ..
$ make

Scripts read and write data from ALLDATA_PATH directory, which should be set up as an environmental variable. If using Conda, follow this instructions: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html

Otherwise, run export ALLDATA_PATH=/path/to/datasets

Database

Results of experiments are stored in MongoDB database, that comes with the dependencies. Before running experiments, start MongoDB instance by

mongod --dbpath <path to db files>

Dumping and restoring results from the database:

$ mongodump --db robustExperimentsDB
$ mongorestore <path to folder with backup files>

Use --port <port> if running instance on a non-default port, and update the connection settings in src/MongoDatabase.py.

Solvers

The repository has implementations of the following solvers:

  • IMM (Reverse Reachable set based Influence Maximization)
  • Saturate Greedy (Robust Influence Maximization solver)
  • Robust Influence Maximization based on the DAGGER reachability index Immunization:
  • PageRank
  • NetShield
  • Katz
  • Aquaintance
  • Betweenness
  • DAVA For the details on the solvers, refer to our publication.

Structure

The repository consists of several "modules". All classes, except of Robust Influence Maximization and Influence Maximization algorithms, are implemented in Python and located in the src directory.

Experiment class loads JSON file, parses it and runs all the solvers in parallel. Parallelisation is done by another wrapper class ParallelRobustWrapper, where number of threads can be adjusted.

Immunization algorithms inherit from Solver.py, and follow name convention <name of solver>Solver.py.

Influence Maximization and Robust Influence Maximization are implemented in C++, but wrapped by the RobustWrapper class, which is used by ParallelRobustWrapper. All different robust solutions are embedded in Rob.cpp, which compiles into Rob binary. The Rob binary reads the sequence of nodes immunized in advance, i.e. the immunization solvers define the sequence for immunization, writes into a file, and then this file is used by Rob.cpp in order to derive a robust seeding strategy. So, the overall stack of calls is:

Experiment > ParallelRobustWrapper > RobustWrapper > Rob

The Rob binary takes a parameter mode, which corresponds to different techniques of seeding, described in our paper (see the "Publication" section below).

All results except files with generated networks are stored in the database.

Parsing experimental parameters

Parsing is performed with a help of ParameterManager, and it additionally loads the config/dag_default.json file. The default configuration is complete and correct. If any JSON in the dags directory missing a parameters, then this parameter is substituted by the default one.

JSON is parsed in a way that allows to define a range of parameters per one experiment. For example, if we want to run an experiment for a graph with sizes 100,200,300; and for the seed set of sizes 20, 30; then we write

{
  "graph_size": [100, [200, 300]],
  "seed_set_size": [20, [30]]
}

Note here, that the first value is the default one (size 100, set size 20), and the default value should not be in an array of possible values. At the end, the framework will run 4 different of parameter sets: (100, 20), (200, 20), (300, 20), (100, 30).

Parameters are grouped by "data", "solver", "strategy" and "problem". The procedure above is performed for parameters within the same group. For each set of parameters in one group, the parser generates a cartesian product with all parameter sets in another group. For instance, each (graph_size, seed_set_size) will be run for each possible parameter set of immunization strategies.

For 'solver' group, possible parameter values are defined as an array

{
  "solver":
  {
    "solver_name":
      {
        "solver_parameter1": [1,2],
        "solver_parameter2": ["r", "s"]
      }
  }
}

The resulting set is a cartesian product of solver parameters 1 and 2.

Running Robust algorithms for a sequence of blocked nodes

Setting number_of_blocked_nodes=-1 will run all algorithms for a sequence of blocked nodes up to n-1, where n is the size of a graph. For IMM solver all-sequence run triggers the Dynamic IMM algorithm (RNI). mode=0 for the Rob solver will also run Dynamic IMM.

Running Experiments

All experiments are described as JSON files in the dags directory. The following script loads a JSON file and runs the corresponding experiment:

python3 scripts/load_and_run_experiment.py <name of experiment>

The name of experiment is the same as the name of the JSON file that describes it.

Some experiments are run by separate scripts in the scripts directory, and the experiments on improving robustness are run directly from the jupyter notebooks (SmallExamples.ipynb and VK.ipynb).

Plotting Results

Results analysis and plotting is done using Python. The notebooks directory contains Jupyter notebooks with plotting scripts. Names of notebooks correspond to types of network. All notebooks use Experiment, Database, Analyzer and Artist classes as common scripts for fetching the results from database, summarizing and setting styles for plotting.

Testing

Run testing by

$ ./run_tests.sh

Publication

This repository is supplementary to the publication

A. Logins, P. Karras, "On the Robustness of Cascade Diffusion under Node Attacks", In Proceedings of the Web Conference, 2020.

Please, consider citing if using the code.

FAQ

Please, feel free to contact me if you have any questions allogn@gmail.com