MultiClass-Sentence-Classification

This repository contains file from preprocessing to deployment(as Azure HTTP trigger function)

Dataset used for model training

Text classification is the fundamental machine learning technique behind applications featuring natural language processing, sentiment analysis, spam & intent detection, and more. This critical function is especially useful for language detection, which allows organizations and individuals to understand things like customer feedback in ways that will inform future approaches. Training an ML model for text classification brings with it challenges. I have compiled some list to ensure you have a seamless and highly-efficient journey getting it done.

Dataset Category	Link
Sentiment Analysis and Review Datasets	Twitter US Airline Sentiment, Paper Reviews, Amazon Product Data,Multi-Domain Sentiment Analysis , Large Movie Review , Opin-Rank Review
Online Content Evaluation Datasets	Spambase, Stop Clickbait, Hate Speech and Offensive Language
News Datasets	The 20 Newsgroups, AG’s New Topic Classification, Reuters Text Categorization

If you have any other dataset, preprocess dataset to follow below format.

Sentence	Label
Sample 1	`label1`
Sample 2	`label2`
Sample 3	`label3`
Sample 4	`label4`
Sample 5	`label1`

Model

In this repository, we have used following models for sentence classification

Tokenizer/Embedder	Model	Rank (based on Precision/Recall/f1-score)
Elmo	`Bi-Directional LSTM`	`2`
word2vec	`LSTM/RNN`	`3`
Bert	`CNN`	`1`

abhishek2f24/MultiClass-Sentence-Classification

MultiClass-Sentence-Classification

Dataset used for model training

Model