Fake News Classification Tool

This project aims to build a fake news classifier that can accurately distinguish fake news from genuine news. We also developed a simple UI, to enable the users to efficiently verify news articles

Figure 1: Overview of the approach

Abstract:

The advent of technology and the development of social media platforms has made it easier to share news updates with the masses.
The effects of fake news can be catastrophic.
Fact checking can prevent people from reacting and taking action on fake news.
Extremely useful for news houses to fact-check their news before they share it with the masses.

Procedure:

Dataset: 20,800 samples with 10387 real news,10413 fake news. Dataset is publically available here
Preprocessing:
1. Removing irrelevant texts and NAN values
2. Stopwords
3. Numerics and special characters
4. Lemmatization
5. Case folding
6. Tokenization
7. Padding
Vectorisation:
1. OneHot Encoding
2. Count Vectoriser
3. Hashing-Vectorizer
4. TF-IDF
5. GloVe Embedding
6. Word2Vec Embedding
7. BERT
Machine Learning / Deep Learning Algorithms for Fake news classification:
1. Naive Bayes (MultinomialNB)
2. DecionTree
3. AdaBoost Classification
4. Logistic Regression
5. Passive Aggressive
6. Multilayer Perceptron
7. LSTM
8. BERT

Figure 2: Pipeline of the approach used

Pipeline & Output:

Output 1: BERT Tokeniser with Bert model

Figure 3: Pipeline for BERT Tokeniser with Bert model

Figure 4: Performance of BERT Tokeniser with Bert model

2: TF-IDF Tokenizer with Passive Aggressive Classifier

Figure 5: Pipeline for TF-IDF Tokenizer + Passive Aggressive Classifier

Figure 6: Performance of TF-IDF Tokenizer + Passive Aggressive Classifier

WebApp:

https://fake-news-detection-nlp.herokuapp.com/

An end-to-end deployed tool which allows user to verify news articles in a click
Efficient and accurate tool for fact checking
A simple minimalistic user interface
It allows user to input news text along with title and author name (both optional fields)
‘Load sample input’ button allows users to understand and test the app

Technologies Used:

Flask, HTML, CSS, JS for building the webapp
Heroku for deploying the webapp
Python, along with Machine learning, deep learning frameworks.

Team Members:

This project has been completed as a course project of CSE556: Natural Language Processing.

Report & Slides

Presentation Slides

Project Report

ria18405/fake-news-classifier