The project was developed as a part of the Fake News Challenge One (FNC-1 http://www.fakenewschallenge.org/) by team Athene: Andreas Hanselowski, Avinesh PVS, Benjamin Schiller and Felix Caspelherr
Prof. Dr. Iryna Gurevych, Ubiquitous Knowledge Processing (UKP) Lab, TU-Darmstadt, Germany
-
Software dependencies
python >= 3.4.0 (tested with 3.4.0)
-
Install required python packages.
python3.4 -m pip install -r requirements.txt --upgrade
-
In order to reproduce the the results of our best submission to the FNC-1, please go to Athene_FNC-1 Google Drive and download the features.zip and model.zip and unzip them in respective folders.
unzip features.zip athene_system/data/fnc-1/features unzip model.zip athene_system/data/fnc-1/mlp_models
-
Parts of the Natural Language Toolkit (NLTK) might need to be installed manually.
python3.4 -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet')"
-
Copy Word2Vec GoogleNews-vectors-negative300.bin.gz in folder athene_system/data/embeddings/google_news/
-
Download Paraphrase Database: Lexical XL Paraphrases 1.0 and extract it to the ppdb folder.
gunzip ppdb-1.0-xl-lexical.gz athene_system/data/ppdb/
-
To use the Stanford-parser an instance has to be started in parallel: Download Stanford CoreNLP, extract anywhere and execute following command:
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9020
-
In order to reproduce the classification results of the best submission at the day of the FNC-1, it is mandatory to use tensorflow v0.9.0 (ideally GPU version) and the exact library versions stated in requirements.txt, including python 3.4.
-
Setup tested on Anaconda3 (tensorflow 0.9 gpu version)*
conda create -n env_python3.4 python=3.4 anaconda source activate env_python3.4 env_python3.4/bin/python3.4 -m pip install -r requirements.txt --upgrade env_python3.4/bin/python3.4 -m pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp34-cp34m-linux_x86_64.whl
To run the pre trained model and test
python pipeline.py -p ftest
For more details
python pipeline.py --help
e.g.: python pipeline.py -p crossv holdout ftrain ftest
* crossv: runs 10-fold cross validation on train / validation set and prints the results
* holdout: trains classifier on train and validation set, tests it on holdout set and prints the results
* ftrain: trains classifier on train/validation/holdout set and saves it to athene_systems/data/fnc-1/mlp_models
* ftest: predicts stances of unlabeled test set based on the model (see Installation, step 2)
After ftest was executed, the labeled stances will be saved to disk:
cat athene_system/data/fnc-1/fnc_results/submission.csv
A more detailed description of the system including the features, which have been used, can be found in the document: system_description_athene.pdf