This is a test ropo. The released version is here.
- Paper link: here
- Other implementation: gensim, deepwalk-c
The implementation includes multi-processing training with CPU and mixed training with CPU and multi-GPU.
- PyTorch 1.0.1+
- PyTorch 1.5.0
- DGL 0.4.3
Format of a network file:
1(node id) 2(node id)
1 3
...
To run the code:
python3 deepwalk.py --net_file net.txt --emb_file emb.txt --adam --mix --lr 0.2 --num_procs 4 --batch_size 100 --negative 5
Functions:
SkipGramModel.save_embedding(dataset, file_name)
SkipGramModel.save_embedding_txt(dataset, file_name)
To evalutate embedding on multi-label classification, please refer to here
YouTube (1M nodes).
Implementation | Macro-F1 (%) 1% 3% 5% 7% 9% |
Micro-F1 (%) 1% 3% 5% 7% 9% |
---|---|---|
gensim.word2vec(hs) | 28.73 32.51 33.67 34.28 34.79 | 35.73 38.34 39.37 40.08 40.77 |
gensim.word2vec(ns) | 28.18 32.25 33.56 34.60 35.22 | 35.35 37.69 38.08 40.24 41.09 |
ours | 24.58 31.23 33.97 35.41 36.48 | 38.93 43.17 44.73 45.42 45.92 |
The comparison between running time is shown as below, where the numbers in the brackets denote time used on random-walk.
Implementation | gensim.word2vec(hs) | gensim.word2vec(ns) | Ours |
---|---|---|---|
Time (s) | 27119.6(1759.8) | 10580.3(1704.3) | 428.89 |
Parameters.
- walk_length = 80, number_walks = 10, window_size = 5
- Ours: 4GPU (Tesla V100), lr = 0.2, batchs_size = 128, neg_weight = 5, negative = 1, num_thread = 4
- Others: workers = 8, negative = 5
Speeding-up with mixed CPU & multi-GPU. The used parameters are the same as above.
#GPUs | 1 | 2 | 4 |
---|---|---|---|
Time (s) | 1419.64 | 952.04 | 428.89 |