# Sentiment Analysis
## Overview
This project implements a sentiment analysis tool that processes text data, specifically movie reviews and tweets, to determine their sentiment scores. The tool utilizes a lexicon-based approach, leveraging a predefined sentiment lexicon to compute sentiment scores for input text.
## Features
- **Data Loading**: Loads IMDB movie reviews and live Twitter data for analysis.
- **Tokenization**: Tokenizes text by removing punctuation and stop words.
- **Sentiment Scoring**: Calculates sentiment scores based on a sentiment lexicon.
- **Data Visualization**: Analyzes sentiment across different tweets to summarize public opinion.
## Requirements
- Python 3.x
- Libraries:
- `nltk`
- `json`
- `urllib`
- `tkinter`
You can install the necessary libraries using pip:
```bash
pip install nltk
-
Clone the repository:
git clone https://github.com/ajaykumar8/sentiment-analysis.git cd sentiment-analysis
-
Download the necessary NLTK data files:
import nltk nltk.download('stopwords') nltk.download('punkt')
-
Prepare the lexicon file (
lexicon.txt
) with sentiment scores. Ensure the format is correct as follows:word1 score1 word2 score2 ...
-
Prepare a JSON file containing tweets in the following format:
[ {"text": "Tweet text 1"}, {"text": "Tweet text 2"}, ... ]
-
Run the script to perform sentiment analysis:
python sentiment_analysis.py
- Input: A string of text (document).
- Output: A list of tokens (words) after removing punctuation and stop words.
- Input: A string of text (document).
- Output: A dictionary containing words and their corresponding sentiment scores from the lexicon.
- Input: A string of text (message).
- Output: The overall sentiment score for the message.
Contributions are welcome! If you have suggestions for improvements or bug fixes, please create a pull request or open an issue.
This project is licensed under the MIT License - see the LICENSE file for details.