/Sentiment-Analysis

This model analyzes tweets to classify them as Positive, Neutral, or Negative. It cleans the text, converts it to numerical features, trains a Logistic Regression model, and evaluates its accuracy.

Primary LanguagePythonMIT LicenseMIT

Sentiment Analysis on Tweets

Overview

This project performs sentiment analysis on tweets, determining whether a tweet expresses a positive, neutral, or negative sentiment. It leverages natural language processing (NLP) techniques and machine learning to analyze textual data, making it a valuable tool for understanding public opinion, monitoring brand sentiment, or analyzing customer feedback.

Key Features

  • Text Preprocessing: Cleans the raw text data by removing special characters, converting text to lowercase, and normalizing whitespace.
  • TF-IDF Vectorization: Converts text into numerical representations based on the importance of words.
  • Machine Learning Model: Utilizes a Logistic Regression classifier for sentiment prediction.
  • Evaluation Metrics: Provides detailed performance evaluation, including accuracy, precision, recall, and F1-score.

How It Works

  1. Data Loading: Reads a labeled dataset of tweets with their corresponding sentiment.
  2. Data Cleaning: Prepares the text for analysis by removing noise and standardizing the format.
  3. Label Encoding: Maps sentiment labels (Positive, Neutral, Negative) to numerical values.
  4. Training: Trains the model using an 80/20 train-test split.
  5. Prediction: Predicts sentiment for test data using the trained model.
  6. Evaluation: Reports accuracy and provides a detailed classification report.

Requirements

  • Python 3.x
  • Libraries:
    • pandas
    • numpy
    • scikit-learn
    • re

How to Run

  1. Clone the repository:
    git clone https://github.com/your-username/sentiment-analysis.git
    cd sentiment-analysis
        
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the script:
    python sentiment_analysis.py
  4. Add the dataset (twitter_training.csv) in the project directory.

Future Enhancements

  • Incorporate additional preprocessing like removing stop words or stemming.
  • Use advanced machine learning models (e.g., SVM, Random Forest) or deep learning models (e.g., LSTMs, Transformers).
  • Expand the dataset to improve model accuracy and generalizability.