/Sentiment-Analysis-and-Spam-Classification

In this project, sentiment analysis is made from the sentences of the model trained with LSTM and also SMS classification is made with Naive Bayes. Thereafter, a Web Application was made with Flask.

Primary LanguageJupyter Notebook


MGC Logo

Sentiment Analysis and Spam Classification

contributors last update forks stars open issues

πŸ“” Table of Contents

🌟 About the Project

This study is a Natural Language Processing project which is one of the artificial intelligence applications. This project was carried out in order to analyze the sentiment from Twitter comments and to understand whether the text message (SMS) received on the phone is unsolicited message (spam). Later, it was integrated into the web and a more understandable and simple graphical interface was created for the users.

πŸ“· Screenshots

Screenshots

πŸ‘Ύ Tech Stack

Client
  • HTML
  • CSS
  • JavaScript
Server
  • Python - Flask
Database
  • MySQL

🎯 Features

  • Prediction of the sentiment of the given sentences
  • Classification of SMS as spam or ham
  • You can create a new dataset (via User Sentences)
  • Recording the messages sent from the user to the database
  • Vanilla language switcher
  • Searching for a specific word in datasets

πŸ’Ώ Datasets

Two different data sets were used in the project. The first is Sentiment140, which is used for sentiment analysis. Sentimen140 is consist of 1.6 million tweets and labelled as "positive" or "negative". The second is the SMS Spam Collection Dataset used for sms classification. SMS Spam Collection Dataset contains almost 5.6k English SMS. Also, this dataset is labeled as two classes too (Spam - Ham). The spam class contains about 5k of data.

⚠️ If you want to examine the dataset, please do not forget to add the datasets to the dataset folder.

πŸ€– Deep Learning

In this section, topics such as model training and preprocessing will be discussed. The Sentiment dataset has been cleaned of some special characters like "@, http, 0-9". In addition, the stop words have been removed. Then, Word2vec was trained from these tokens. After that, these texts are pad_sequenced with a maximum length of 300. After the embedding layer was created, the vanilla LSTM model was builded. The final accuracy of the model is 79.10%. The model architecture can be seen in the figure below.


Model Architecture

Model Architecture (Image by Author)

The Spam dataset was trained with Multinomial Naive Bayes algorithm is a Bayesian learning approach popular in Natural Language Processing (NLP).

πŸ’» Flask

The Web Application consists of 5 pages which can be seen in the gif above. These are Home, Project, About, Contact and finally Dataset page.

πŸ—‚οΈ Database

Users can submit their opinions, suggestions or problems about the project after filling out the form on the Contact page. Some information in the form is recorded in the database.

SQL query that saves data to MySQL database:

CREATE TABLE contact (
	id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
	name VARCHAR(30) NOT NULL,
	email VARCHAR(30) NOT NULL,
	company_name VARCHAR(50),
	message VARCHAR(200),
	reg_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
	);

πŸ—ΊοΈ Multi-Language

Web App offers you two different language support. One is in English and the other is in Turkish. This option is made with vanilla Javascript and is open for development.

πŸƒ How to Run

1.Fork this repository.

git clone https://github.com/MelihGulum/Sentiment-Analysis-and-Spam-Classification.git

2.Load the dependencies of the project

pip install -r requirements.txt

3.Now you can run project.

flask --app app.py --debug run