- Python 3.7
- numpy==1.11.1
- dgl-gpu==0.6.1
- nltk==3.5
- scikit-learn==0.23.0
- scipy==1.4.1
- sklearn==0.0
- spacy==2.2.2
- tokenizers==0.9.3
- torch==1.6.0
- torchtext==0.6.0
- tqdm==4.60.0
- transformers==3.5.1
- faiss
python -u build_graph.py --meSH_pair_path MeSH_name_id_mapping_2019.txt --mesh_parent_children_path MeSH_parent_child_mapping_2019 --word2vec_path BioWord2Vec_standard.w2v --graph_type ‘GCN’
python -u journal_info.py --data train.json --save journal_info.pkl
Steps (see details in get_mesh_mask.py
):
- get idf vector for each document
python -u get_mesh_mask.py --allMeSH train.json --meSH_pair_path MeSH_name_id_mapping_2019.txt -- save_path_idf idf.json
- get masks using KNN
python -u get_mesh_mask.py --allMeSH train.json --meSH_pair_path MeSH_name_id_mapping_2019.txt --save_path_idf idf.json
- get masks from journal and merge the masks generated from neighbours
python -u get_mesh_mask.py --allMeSH train.json --meSH_pair_path MeSH_name_id_mapping_2019.txt --neigh_path neigh.json --journal_info journal_info.pkl --threshold 0.5 --save_path dataset.json
python -u run_classifier_multigcn.py --title_path pmc_title.pkl --abstract_path pmc_abstract.pkl --label_path pmc_meshLabel.pkl --mask_path mesh_mask.pkl --meSH_pair_path MeSH_name_id_mapping_pmc_2020.txt --word2vec_path BioWord2Vec_standard.w2v --graph gcn_pmc.bin --save-model-path model.pt --batch_sz 32 --model_name 'Full'
python -u run_classifier_multigcn.py --title_path pmc_title.pkl --abstract_path pmc_abstract.pkl --label_path pmc_meshLabel.pkl --mask_path mesh_mask.pkl --meSH_pair_path MeSH_name_id_mapping_pmc_2020.txt --word2vec_path BioWord2Vec_standard.w2v --graph gcn_pmc.bin model model.pt --batch_sz 32 --model_name 'Full'
If you use KenMeSH in your work, please consider citing our paper