/ML9417_Ass2

Attempt for Comp9417 Machine Learning: Assignment 2

Primary LanguagePython

ML9417_Ass2

Python Files

FakeNewsChallenge.py

Contains the FakeNews class which stores the training and test articles. Has a Headline, Body ID, Body and stance element. Contains all methods relating to training and using the Naive Bayes Classifer. Also contains code to run the Fake News Challenge, training with the training set and writing out results after predicting the competition set. Writes results to result.csv in the format: Headline,Body ID,Stance.

Validate.py

Helper python script that also trains the classifer on the training data but then performs 10-fold validation to validate our methods and increase accuracy between different optimisations. Outputs to 10 csv file labeled with their subset number. Writes results of the k-fold validation to csv files named: labeled_results_x.csv and results_x.csv where x is the number of the subset that was used to train the model. These outputs can then be used by scorer.py to get the various scores of the validation.

Scorer.py

Script provided by Fake News Challenge that takes a test set and your results and returns the accuracy and a score. Usage: python scorer.py competition_test_stances.csv results.csv Where results.csv is generated by runnnig FakeNewsChallenge.py above.

Datasets

The datasets made the submission file exceed the limit so here is a google drive share link to a private folder containing all our results and data sets: https://drive.google.com/drive/folders/1UFpabfft9Ly6TlX_xzkltIzZVUVepijd?usp=sharing

All datasets provided by Fake News Challenge's Github, available here: https://github.com/FakeNewsChallenge/fnc-1

train_bodies.csv

Training data provided in the format: Body ID,Body

train_stances.csv

Training data provided in the format: Headline,Body ID,Stance

competition_test_bodies.csv

Testing data provided in the format: Body ID,Body

competition_test_stances_unlabeled.csv

Testing data provided in the format: Headline,Body ID Note that no stance is provided so that our classifier can predict it.

competition_test_stances.csv

Testing data provided in the format: Headline,Body ID,Stance Used with scorer.py and the results.csv file to calculate the accuracy and score the classifier.