You'll need Python, R, Gorobi, Spark, and Vowpal Wabbit installed and accessible on the path.
Either copy non_naqt.db to data/questions.db, simlink it, or copy your own questions.db file.
Run the script "python util/install_python_packages.py", which will install several python packages you'll need. (You may need admin access.)
Run the script "python util/install_nltk_data.py", which will download some nltk data. You should not use admin access for this script.
Download the Illinois Wikifier code (VERSION 2). Place the data directory in data/wikifier/data and put the wikifier-3.0-jar-with-dependencies.jar in the lib directory.
- Generate the Makefile
python generate_makefile.py
- Generate the guess database (this takes a while, depends on DAN---60 hours---and guesses---40 hours)
make data/guesses.db
- Generate the LM pickle (18 hours)
make data/lm.pkl
- generate features, train all models, and get predictions.
make all_sentence_buzz
Feature timings:
* classifier: 216 features lines per sec
* lm: 139.028408 feature lines per sec
* deep: 84.391876 feature lines per sec
* text: 158.384899 feature lines per sec
* wikilinks: 62.842486 feature lines per sec
* answer_present: 155.469810 feature lines per sec
If you are interested in getting the qb system running end to end without training the full system, you can follow these steps.
- Generate the Makefile like above
- Run
make data/deep/glove.840B.300d.txt.gz
to download some data