/FakeNewsDetection

Identify and label Tweets as Fake News

Primary LanguagePython

Fake News Detection

Sample Activity for the AI Course at Allegheny College

Table of contents

About

Media outlets and social media platforms run rampant with "fake news," or information that has not been fact-checked, especially as they become more opinionated and stray away from centrist, fact-based reporting. This is an increasing issue in reporting, as the public receives most of their information in this way and depend on these outlets to be informed. According to BBC, false information can take many forms (satire, clickbate, propaganda, and mistakes), and it can be classified as disinformation or misinformation. It is very difficult for the public to identify any media outlet or social media post as any of these classifications without reading competing claims or doing their own research. Therefore, the purpose of this project is to show how a potential Fake News Detection tool can be built and used by various platforms to warn users of the content before they read.

This ethical tool will search through Tweets and attempt to label them as "right", "left", "centrist" and also add a level of fake news detection to them. While Twitter is currently working on this feature, it is not completely employed at the moment. The goal of this feature is to minimize the amount of fake news that the public recieves from social media outlets.

This detection is done by searching the selected user's tweets for various words which indicate fake news, such as "most", "least", etc.

The project is funded by Mozilla Foundation and it will be used in Data Analytics course at Allegheny College. Please visit the Allegheny Ethical CS for more information.

Features

  • Twitter API to search for a user and their screen name

  • Tweet classification(binary)

    • Naive Bayes
    • Linear SVM
    • Credit to Zach Leonardo on Polarized
  • Tweet classification

    • fake
    • true
    • Credit to @FavioVazquez on fake-news

Installation

  • Clone the source code onto your machine

    With HTTPS:

    https://github.com/Allegheny-Mozilla-Fellows/FakeNewsDetection.git

    or With SSH:

    git@github.com:Allegheny-Mozilla-Fellows/FakeNewsDetection.git

Run

After pulling the repo, enter into the src/ directory by using the command cd src/ and installing the following recommended packages: tweepy, textblob , nltk, and textblob corpora_ via pip.

pip install tweepy

and then,

pip install textblob

and then,

pip install nltk

and then,

python3 -m textblob.download_corpora

Please note that you may have to install more packages using pip to run this program (for example, nltk, twitter, etc.).

After installing these packages, you will run the program with the command python __main__.py

After running this command, you will be prompted to enter the name of a given senator, which the API will cross-reference with current Twitter users. You will then confirm the name of the senator and choose your preferred diagram for output.

Future work

Currently this project examines tweets and users stored in CSV files and it only utilizes the API to cross-reference the user's screen name with what is in the CSV file. Users of this program can experiment within the limitations of these files. This project can be further extended by examining tweets in real-time (i.e., outside of the file/utilizing more that the Twitter API has to offer) and adding more classification algorithims for comparison (i.e., more than just true or false), or adding features to visualize how many tweets contain false information. Another great addition to the project would be utilizing other methods to detect fake news, such as coding different algorithims, developing a Bot, or using AI.

Reading Material

Here is the list of articles that may give the user more insights into fake news detection.

Ethical Discussions

  • What happens if one news outlet or platform produces more fake news than another? Will that alter the way we percieve news and/or classify facts?

  • Why might algorithims be particularly harmful for detecting fake news?

  • Should we enforce using fake news detecting algorithims? Do media outlets and social media platforms have an obligation to detect fake news?

  • What are some of the ways we can prevent biases in fake news detection algorithims as developers and as users?

Data used

The files used in this project are retrieved from Zach Leonardo's Polarized project and @FavioVazque's Fake-News project and are stored in data. These files store over 95,000 tweets (in ExtractedTweets.csv and ExtractedTweets2.csv each) and approximately 100 senators (in senators.csv).

The tweets include a senator's party, their Twitter handle, and the content of the tweet. The senators file contains a senator's Twitter username and their party.

Contact

If you have any questions or concerns about this project please contact: