#Fake News Detection Using Machine Learning

Table of Contents

  1. Introduction
  2. Features
  3. Project Structure
  4. Installation
  5. Usage
  6. Datasets
  7. Models
  8. Results
  9. Contributing
  10. License

Introduction

Fake News Detection is a critical challenge in the era of digital information. This project aims to detect fake news articles using advanced machine learning techniques. The system is built to identify patterns and features that distinguish fake news from real news.

Features

  1. Preprocessing: Cleans and preprocesses the text data for model training.
  2. Feature Extraction: Uses techniques like TF-IDF for extracting features from text.
  3. Model Training: Includes various machine learning models like Logistic Regression, Naive Bayes, and more.
  4. Evaluation: Measures performance using metrics such as accuracy, precision, recall, and F1-score.
  5. Visualization: Provides insights into model performance and data distribution through visualizations.

Project Structure

│ ├── data/ │ ├── Fake.csv │ ├── True.csv │ └── ... ├── notebooks/ │ ├── Fake_News_Detection.ipynb │ └── ... ├── models/ │ ├── model.pkl │ └── ... ├── src/ │ ├── preprocess.py │ ├── train.py │ ├── evaluate.py │ └── ... ├── requirements.txt └── README.md

  • data: Contains the datasets used for training and testing.
  • notebooks: Jupyter notebooks for exploratory data analysis and model experimentation.
  • models: Trained models and their saved states.
  • src: Source code for preprocessing, training, and evaluation.
  • requirements.txt: Python dependencies.

Installation

  1. Data Manipulation import pandas as pd import numpy as np

  2. Text Processing import nltk import re

  3. Feature Extraction from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.model_selection import train_test_split

  4. Model Training from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import MultinomialNB from sklearn.svm import SVC from sklearn.ensemble import RandomForestClassifier

  5. Model Evaluation from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report

  6. Visualization import matplotlib.pyplot as plt import seaborn as sns

Datasets

The project utilizes the following datasets from Kaggle:

  • Fake News: Fake News Dataset - Contains fake news articles for training.
  • Real News: True News Dataset - Contains real news articles for training.

Models

The project employs the following models:

  • Logistic Regression: Effective for binary classification problems.
  • Naive Bayes: Known for its efficiency with text data.
  • Support Vector Machine (SVM): For high-dimensional spaces.
  • Random Forest: Ensemble method for better performance.

Results

Model performance is evaluated using:

  1. Accuracy: Measures the overall correctness of the model.
  2. Precision: Indicates the proportion of true positives.
  3. Recall: Reflects the model's ability to capture all relevant instances.
  4. F1-Score: Harmonic mean of precision and recall.

Detailed results and visualizations can be found in the Fake_News_Detection.ipynb notebook.

Contributing

Contributions are welcome! Please submit a pull request or open an issue to discuss your ideas.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Follow and Star

If you find this project useful, please consider following me and starring the repository: