Notification

This is a test ropo. The released version is here.

DeepWalk

The implementation includes multi-processing training with CPU and mixed training with CPU and multi-GPU.

Dependencies

  • PyTorch 1.0.1+

Tested version

  • PyTorch 1.5.0
  • DGL 0.4.3

How to run the code

Format of a network file:

1(node id) 2(node id)
1 3
...

To run the code:

python3 deepwalk.py --net_file net.txt --emb_file emb.txt --adam --mix --lr 0.2 --num_procs 4 --batch_size 100 --negative 5

How to save the embedding

Functions:

SkipGramModel.save_embedding(dataset, file_name)
SkipGramModel.save_embedding_txt(dataset, file_name)

Evaluation

To evalutate embedding on multi-label classification, please refer to here

YouTube (1M nodes).

Implementation Macro-F1 (%)
1%    3%    5%    7%    9%
Micro-F1 (%)
1%    3%    5%    7%    9%
gensim.word2vec(hs) 28.73   32.51   33.67   34.28   34.79 35.73   38.34   39.37   40.08   40.77
gensim.word2vec(ns) 28.18   32.25   33.56   34.60   35.22 35.35   37.69   38.08   40.24   41.09
ours 24.58   31.23   33.97   35.41   36.48 38.93   43.17   44.73   45.42   45.92

The comparison between running time is shown as below, where the numbers in the brackets denote time used on random-walk.

Implementation gensim.word2vec(hs) gensim.word2vec(ns) Ours
Time (s) 27119.6(1759.8) 10580.3(1704.3) 428.89

Parameters.

  • walk_length = 80, number_walks = 10, window_size = 5
  • Ours: 4GPU (Tesla V100), lr = 0.2, batchs_size = 128, neg_weight = 5, negative = 1, num_thread = 4
  • Others: workers = 8, negative = 5

Speeding-up with mixed CPU & multi-GPU. The used parameters are the same as above.

#GPUs 1 2 4
Time (s) 1419.64 952.04 428.89