Active Learning using Node Embeddings in Partially Observed Networks | [CS768] - Learning With Graphs | Aut'21

Primary LanguagePython

Active Learning using Node Embeddings in Partially Observed Networks.

Work done as a part of CS768 - Learning With Graphs [Autumn' 21].

Find the report here.

Team Information:

Name Contact
Richeek Das richeek@cse.iitb.ac.in
Saumya Goyal saumyagoyal@cse.iitb.ac.in
Tirthankar Adhikari 190070003@iitb.ac.in

Requirements [Tested On]:



│   README.md
│   run.py                [For general testing of code] 
|   query.py              [Contains different querying strategies]
|   utils.py              [Contains several helper functions]  
│   exp1.py               [Run Experiment 1]
│   exp2.py               [Run Experiment 2]
│   plt_exp1.py           [Plot Experiment 1]
│   plt_exp2.py           [Plot Experiment 2]
│   |   datasets...
    │   cne.py            [Conditional Network Embedding]
    │   cne_known.py      [Conditional Network Embedding Known]
    |   maxent.py         [Has the operations on prior] 


Experiment 1: Comparison of the average ROC-AUC scores achieved by CNE, CNE_K and SINE as the Node Embedding models in ALPINE with respect to the change in % of network observed. Experiment 1

To run this experiment: python3 exp1.py

To plot: python3 plt_exp1.py

Experiment 2: Comparison of the average ROC-AUC scores achieved by CNE_K as the Node Embedding model with respect to the different querying strategies implemented, with and without the Information Density weighting addition. Experiment 2

To run this experiment: python3 exp2.py

To plot: python3 plt_exp2.py


  1. Select and load a dataset in run.py.
  2. Choose Case (i.e., 1, 2, 3), set the values of r_0 (the initially observed portion of node pairs), nr_split and nr_ne (the averaging parameters that can be set small in order to save time).
  3. 'python run.py'

Note that for large networks, e.g., dataset blog, it takes large memory and a few hours to iterate 5 times. More specifically, to run experiments on blog network, parallel computation for all the strategies might cause memory error if the device memory is not enough. But you can still run it sequentially. An example for the run time - blog with r_0 being 10% for Case-2 would take approximately 4 hours. It is also possible to define your own PON, as well as the pool and the target set.


The results are visualized in the 'folder' defined for the this experiment named 'results.png'. See line 129~145 in run.py: the folder with results for different cases start with:

  • Case-1 - 'TU_PU_r0...'
  • Case-2 - 'TU_r0...'
  • Case-3 - 'r0_...'

Forked from parent repository