This repository contains file from preprocessing to deployment(as Azure HTTP trigger function)
Text classification is the fundamental machine learning technique behind applications featuring natural language processing, sentiment analysis, spam & intent detection, and more. This critical function is especially useful for language detection, which allows organizations and individuals to understand things like customer feedback in ways that will inform future approaches. Training an ML model for text classification brings with it challenges. I have compiled some list to ensure you have a seamless and highly-efficient journey getting it done.
Dataset Category | Link |
---|---|
Sentiment Analysis and Review Datasets | Twitter US Airline Sentiment, Paper Reviews, Amazon Product Data,Multi-Domain Sentiment Analysis , Large Movie Review , Opin-Rank Review |
Online Content Evaluation Datasets | Spambase, Stop Clickbait, Hate Speech and Offensive Language |
News Datasets | The 20 Newsgroups, AG’s New Topic Classification, Reuters Text Categorization |
If you have any other dataset, preprocess dataset to follow below format.
Sentence | Label |
---|---|
Sample 1 | label1 |
Sample 2 | label2 |
Sample 3 | label3 |
Sample 4 | label4 |
Sample 5 | label1 |
In this repository, we have used following models for sentence classification
Tokenizer/Embedder | Model | Rank (based on Precision/Recall/f1-score) |
---|---|---|
Elmo | Bi-Directional LSTM |
2 |
word2vec | LSTM/RNN |
3 |
Bert | CNN |
1 |