There are codes for Microblog TREC 2015.
- Step.0 Install the packages pre-requested.
Java 1.8.0_40
Python 2.7.9
Weka 3.6.12
json-lib-2.1
numpy 1.6.2
scipy 0.15.1
nltk 3.0.2
word2vec
gensim
py4J
pandas 0.16.2
-
Step.1 Use word2vec.py to train a model with file downloaded from wikipedia. The file is huge so you need to download from wikipedia by yourself. Then type "python process_wiki.py dir" in command line to train a model, dir is the directory where wikipedia english corpus saved, then type "python train_word2vec_model.py dir" to get the model file, dir is the directory of processed wikipedia corpus, the final result file will be named
-
Step.2
#DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT