To run the application on your computer you need to have some packages installed.
pip install flask
Also, you need to install some NLP packages as NLTK and download stopwords extension:
import nltk
nltk.download()
The dataset contains 4 columns. Text is column with user messages and category is corrensponding class of the message.
- Deleted missing values
- Deleted some values from category column that doesn't make any sense.
- Made some visualizations.
- Checked the values distribution of category column.
- Removed stopwords
- Lowercased all words
- Changed data type to string
I tried 3 approaches to this problem
- Naive Bayes Classifier for Multinomial Models Since all columns are linearly independent we are able to use Naive Bayes classifier here. Multinomial Naive Bayes classifier is a specific instance of a Naive Bayes classifier which uses a multinomial distribution for each of the features.
- Linear Support Vector Machine Linear Support Vector Machine is widely regarded as one of the best text classification algorithms.
- Logistic Regression Logistic regression is a simple and easy to understand classification algorithm, and Logistic regression can be easily generalized to multiple classes.
So, I selected LG model.
To test the app to need to run the app and go to the classify-text route and submit your message. The you will see a json file with {text: usertex: , category: classified category}.