this is a pipeline of author name disambiguation.
- python 3.6.5
- networkx 1.11
- gensim 3.4.0
- sklearn 0.20.1
- numpy 1.14.3
- pandas 0.23.0
- tensorflow 1.10.0
Note: you are recommended to run this pipeline on windows.
# step 1: preprocess the data
python data_processing.py
# step 2: train the GRU based encoder to learn deep semantic representations
python DRLgru.py
# step 3: construct a PHNet and generate random walks
python walks.py
# step 4: weighted heterogeneous network embedding
python WHNE.py
# step 5: generate clustering results
python evaluator.py
you are recommended to use the word2vec model we pre-trained to generate word embeddings of publication titles via OneDrive (or BaiduYun). Or you can train your own word vectors(dimension = 100) using the word2vec method in gensim library.