SGNS-Pytorch

Skip-Gram and Negative Sampling for Chinese dataset in pytorch

Download Chinese Wiki dataset from zhwiki_download, and save it in data/
Preprocess for dataset：
- Change the .xml file to .txt file: python XML2txt.py
- Transform Traditional Chinese into simplified Chinese: python tans_t2s.py
- Chinese word segmentation: python seg_wiki.py
Train your model:
- Train a model by using gensim: python SGNS_train.py
- Train our SGNS model: python train.py

BoO-18/SGNS-Pytorch