/Fake-News-Classification

A Comparative Analysis of Different Approaches for Fake News Classification

Primary LanguageJupyter NotebookMIT LicenseMIT

Banner

Fake News Classification

We have all seen fake news forwards on our WhatsApp messages. Generally, these articles are generated by bots and internet trollers and are used with an intent to intrigue the audience and mislead them. Fake news can be very dangerous as it can spread misinformation and inflict rage in public. It is now becoming a serious problem in India due to more and more people using social media and lower levels of digital awareness.

Table of contents

Demo

(Back to top)

Here's a screenrecording of the Model in action. I copied a article from a authentic and reputed news source, pasted it on the text block and ran inference. As you can see the model gave the correct prediction of the article being Real. The code for this UI can be found here

Demo GIF

Aim

(Back to top)

The aim of this project is to make a Fake News Classification using various techniques like Recurrent Neural Networks and RandomForestClassification and figure out which performs the best for this use case.

Installation

(Back to top)

To use this project, first clone the repo on your device using the command below:

git init

https://github.com/SauravMaheshkar/Fake-News-Classification.git

Stack

(Back to top)

The following libraries and modules were used in this software:

Development

(Back to top)

Method 1: Random Forest Classification

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set. Random forests generally outperform decision trees, but their accuracy is lower than gradient boosted trees. However, data characteristics can affect their performance.

Method 2: Bidirectional Recurrent Neural Networks

Bidirectional Recurrent Neural Networks (BRNN) connect two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously. Invented in 1997 by Schuster and Paliwal, BRNNs were introduced to increase the amount of input information available to the network. Standard recurrent neural network (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. Moreover, their future input information is reachable from the current state.

Method 3: Decision Tree Classification

Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy). Leaf node (e.g., Play) represents a classification or decision. The topmost decision node in a tree which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical data.

Method 4: Support Vector Machines

In machine learning, support-vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Developed at AT&T Bell Laboratories by Vapnik with colleagues, it presents one of the most robust prediction methods, based on the statistical learning framework or VC theory proposed by Vapnik and Chervonenkis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier .

Request

(Back to top)

Bug

(Back to top)

If you spot a bug in the program kindly raise a issue. Instructions for raising an issue can be found here

Contribute

(Back to top)

If you want to contribute to the project kindly mail me at sauravvmaheshkar@gmail.com.

Step 1

  • Option 1 🍴 Fork it!
  • Option 2 👯‍♂️ Clone this repo to your local machine using https://github.com/SauravMaheshkar/Fake-News-Classification.git

Step 2

  • HACK AWAY! 🔨🔨🔨

Step 3

  • 🔃 Create a new pull request using https://github.com/SauravMaheshkar/Fake-News-Classification/compare/

License

(Back to top)

License

The data for this project was taken from kaggle datasets. The owner of the dataset is Clément Bisaillon. You can find the dataset here.

Credits

The inspiration for this readme file came from