Sentiment Analysis on Tweets

Overview

This project performs sentiment analysis on tweets, determining whether a tweet expresses a positive, neutral, or negative sentiment. It leverages natural language processing (NLP) techniques and machine learning to analyze textual data, making it a valuable tool for understanding public opinion, monitoring brand sentiment, or analyzing customer feedback.

Key Features

Text Preprocessing: Cleans the raw text data by removing special characters, converting text to lowercase, and normalizing whitespace.
TF-IDF Vectorization: Converts text into numerical representations based on the importance of words.
Machine Learning Model: Utilizes a Logistic Regression classifier for sentiment prediction.
Evaluation Metrics: Provides detailed performance evaluation, including accuracy, precision, recall, and F1-score.

How It Works

Data Loading: Reads a labeled dataset of tweets with their corresponding sentiment.
Data Cleaning: Prepares the text for analysis by removing noise and standardizing the format.
Label Encoding: Maps sentiment labels (Positive, Neutral, Negative) to numerical values.
Training: Trains the model using an 80/20 train-test split.
Prediction: Predicts sentiment for test data using the trained model.
Evaluation: Reports accuracy and provides a detailed classification report.

Requirements

Python 3.x
Libraries:
- pandas
- numpy
- scikit-learn
- re

How to Run

Clone the repository:

git clone https://github.com/your-username/sentiment-analysis.git
cd sentiment-analysis

Install dependencies:
```
pip install -r requirements.txt
```
Run the script:
```
python sentiment_analysis.py
```
Add the dataset (twitter_training.csv) in the project directory.

Future Enhancements

Incorporate additional preprocessing like removing stop words or stemming.
Use advanced machine learning models (e.g., SVM, Random Forest) or deep learning models (e.g., LSTMs, Transformers).
Expand the dataset to improve model accuracy and generalizability.