Contains the FakeNews class which stores the training and test articles. Has a Headline, Body ID, Body and stance element. Contains all methods relating to training and using the Naive Bayes Classifer. Also contains code to run the Fake News Challenge, training with the training set and writing out results after predicting the competition set. Writes results to result.csv in the format: Headline,Body ID,Stance.
Helper python script that also trains the classifer on the training data but then performs 10-fold validation to validate our methods and increase accuracy between different optimisations. Outputs to 10 csv file labeled with their subset number. Writes results of the k-fold validation to csv files named: labeled_results_x.csv and results_x.csv where x is the number of the subset that was used to train the model. These outputs can then be used by scorer.py to get the various scores of the validation.
Script provided by Fake News Challenge that takes a test set and your results and returns the accuracy and a score. Usage: python scorer.py competition_test_stances.csv results.csv Where results.csv is generated by runnnig FakeNewsChallenge.py above.
The datasets made the submission file exceed the limit so here is a google drive share link to a private folder containing all our results and data sets: https://drive.google.com/drive/folders/1UFpabfft9Ly6TlX_xzkltIzZVUVepijd?usp=sharing
All datasets provided by Fake News Challenge's Github, available here: https://github.com/FakeNewsChallenge/fnc-1
Training data provided in the format: Body ID,Body
Training data provided in the format: Headline,Body ID,Stance
Testing data provided in the format: Body ID,Body
Testing data provided in the format: Headline,Body ID Note that no stance is provided so that our classifier can predict it.
Testing data provided in the format: Headline,Body ID,Stance Used with scorer.py and the results.csv file to calculate the accuracy and score the classifier.