/Fake_and_Real_News_Classification

An algorithm to determine if an article is fake or real

Primary LanguageJupyter Notebook

Fake And Real News Classification

image


This project is a part of the Data Science (DSF-FT) Course at Moringa School. The full project description can be found here.

Table of Contents:

  • Overview
  • Business Problem
  • Data Understanding - The dataset was sourced from Kaggle
  • Data Cleaning: Validity, Completeness, Consistency, Uniformity
  • Exploratory Data Analysis
  • Modeling:
    • Preprocessing techniques in NLP
    • Building models
    • Model validation
  • Deployment

Project Description:

With the current technology, almost every individual has an access to internet and there are no restrictions to what one posts. With this, people can obtain news from them and believe that they are legitimate which might not be case. Feeding information from the internet can affect oneself in one way or another. To avoid this, this project aims to analyse data using text classification with NLP to determine whether an article posted is real or not.

Technologies Used:

  • Pandas
  • Seaborn
  • Scikit-Learn
  • NLTK
  • Streamlit

Project Features:

From the data set, the project focuses on the text column as the independent and category column as the dependent variable.

Project Future Use:

Using this project one will be able to tell whether or not an article is legitimate which will improve on how people percieve on things and situations.

Deployment:

The project was deployed using Streamlit. The link to the deployed project can be found here

Repository Structure:

    ├── README.md
    | 
    ├── .gitignore
    | 
    |── index.ipynb    
    | 
    ├── models
    |    └── model.pkl
    | 
    ├── demo
    |    ├── requirements.txt
    |    └── ml-app.py
    |
    └── data_preprocessing
         ├── cities.txt
         ├── countries.txt
         ├── months.txt
         ├── names.txt
         ├── states.txt
         └── week.txt