/treehacks-fake-news-detection

Primary LanguageHTMLGNU General Public License v3.0GPL-3.0

treehacks-fake-news-detection

Project Description

Use a machine learning model to detect fake news and categorize news articles into 4 categories: {'agree', 'disagree', 'unrelated', 'discuss'}

Formal Definitions

Input

A headline and a body text - either from the same news article or from two different articles.

Output

Classify the stance of the body text relative to the claim made in the headline into one of four categories:

  • Agrees: The body text agrees with the headline.
  • Disagrees: The body text disagrees with the headline.
  • Discusses: The body text discuss the same topic as the headline, but does not take a position
  • Unrelated: The body text discusses a different topic than the headline

Data

Data Source

Data Distribution

The distribution of Stance classes in train_stances.csv is as follows:

rows unrelated discuss agree disagree
49972 0.73131 0.17828 0.0736012 0.0168094

Microsoft Azure Machine Learning Studio Experiment

Model Prediction Accuracy

Deep Neural Network: 86%

TODO

  • find a suitable ML library for price prediction - Azure Machine Learning Library
  • determine input features - (headline, body)
  • find data to train the model - found
  • build Azure experiment - done
  • analyze model prediction accuracy - done
  • design backend API
  • design chatbot interface (possible integration with Slack?)
  • test

Tech needed

  • MS Azure ML library
    • knowledge based data analysis
    • speech recognition
    • internationalization (optional)
  • MS Bot Framework
  • Stdlib backend API