ProNE: Fast and Scalable Network Representation Learning
Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang and Ming Ding
Accepted to IJCAI 2019 Research Track!
- Linux or macOS
- Python 2 or 3
- scipy
- sklearn
Clone this repo.
git clone https://github.com/lykeven/ProNE
cd ProNE
Please install dependencies by
pip install -r requirements.txt
These datasets are public datasets.
- PPI contains 3,890 nodes and 76,584 edges.
- blogcatalog contains 10,312 nodes and 333,983 edges.
- youtube contains 1,138,499 nodes and 2,990,443 edges.
You can use python proNE.py -graph example_graph
to train ProNE model on the example data.
If you want to train on the PPI dataset, you can run
python proNE.py -graph data/PPI.ungraph -emb1 PPI_sparse.emb -emb2 PPI_spectral.emb
-dimension 128 -step 10 -theta 0.5 -mu 0.2
Where PPI_sparse.emb and PPI_spectral.emb are output embedding files and dimension, step, theta and mu are our model parameters.
If you want to train ProNE on your own dataset, you should prepare the following files:
- edgelist.txt: Each line represents an edge, which contains two tokens
<node1> <node2>
where each token is a number starting from 0.
ProNE is mainly single-thread(except for the svd on small the matrices). We also provide a c++ multi-thread program ProNE.cpp for large-scale network based on Eigen and redsvd. Besides, gflags is required to parse command parameter. This version is about 3 times faster than the reported result in paper on youtube and the performance is still optimizing.
Compile it via
g++ ProNE.cpp -l redsvd -l gflags -o3 -o ProNE.out
If you want to train on the PPI dataset, you can run
./ProNE.out -filename data/PPI.ungraph -emb1 emb/PPI.emb1 -emb2 emb/PPI.emb2
-num_node 3890 -num_step 10 -num_thread 20 -num_rank 128 -theta 0.5 -mu 0.2
If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.