#Fake News Detection Using Machine Learning
- Introduction
- Features
- Project Structure
- Installation
- Usage
- Datasets
- Models
- Results
- Contributing
- License
Fake News Detection is a critical challenge in the era of digital information. This project aims to detect fake news articles using advanced machine learning techniques. The system is built to identify patterns and features that distinguish fake news from real news.
- Preprocessing: Cleans and preprocesses the text data for model training.
- Feature Extraction: Uses techniques like TF-IDF for extracting features from text.
- Model Training: Includes various machine learning models like Logistic Regression, Naive Bayes, and more.
- Evaluation: Measures performance using metrics such as accuracy, precision, recall, and F1-score.
- Visualization: Provides insights into model performance and data distribution through visualizations.
│ ├── data/ │ ├── Fake.csv │ ├── True.csv │ └── ... ├── notebooks/ │ ├── Fake_News_Detection.ipynb │ └── ... ├── models/ │ ├── model.pkl │ └── ... ├── src/ │ ├── preprocess.py │ ├── train.py │ ├── evaluate.py │ └── ... ├── requirements.txt └── README.md
- data: Contains the datasets used for training and testing.
- notebooks: Jupyter notebooks for exploratory data analysis and model experimentation.
- models: Trained models and their saved states.
- src: Source code for preprocessing, training, and evaluation.
- requirements.txt: Python dependencies.
-
Clone the repository: git clone https://github.com/testgithubrittttttt/Fake-News-Detection.git cd Fake-News-Detection
-
Create and activate a virtual environment: Install the dependencies:
-
Data Manipulation import pandas as pd import numpy as np
-
Text Processing import nltk import re
-
Feature Extraction from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.model_selection import train_test_split
-
Model Training from sklearn.linear_model import LogisticRegression from sklearn.naive_bayes import MultinomialNB from sklearn.svm import SVC from sklearn.ensemble import RandomForestClassifier
-
Model Evaluation from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report
-
Visualization import matplotlib.pyplot as plt import seaborn as sns
The project utilizes the following datasets from Kaggle:
- Fake News: Fake News Dataset - Contains fake news articles for training.
- Real News: True News Dataset - Contains real news articles for training.
The project employs the following models:
- Logistic Regression: Effective for binary classification problems.
- Naive Bayes: Known for its efficiency with text data.
- Support Vector Machine (SVM): For high-dimensional spaces.
- Random Forest: Ensemble method for better performance.
Model performance is evaluated using:
- Accuracy: Measures the overall correctness of the model.
- Precision: Indicates the proportion of true positives.
- Recall: Reflects the model's ability to capture all relevant instances.
- F1-Score: Harmonic mean of precision and recall.
Detailed results and visualizations can be found in the Fake_News_Detection.ipynb notebook.
Contributions are welcome! Please submit a pull request or open an issue to discuss your ideas.
This project is licensed under the MIT License. See the LICENSE file for more details.
If you find this project useful, please consider following me and starring the repository: