/MultiClass-Sentence-Classification

This repository contains file from preprocessing to deployment(as Azure HTTP trigger function)

Primary LanguagePowerShell

MultiClass-Sentence-Classification

This repository contains file from preprocessing to deployment(as Azure HTTP trigger function)

Dataset used for model training

Text classification is the fundamental machine learning technique behind applications featuring natural language processing, sentiment analysis, spam & intent detection, and more. This critical function is especially useful for language detection, which allows organizations and individuals to understand things like customer feedback in ways that will inform future approaches. Training an ML model for text classification brings with it challenges. I have compiled some list to ensure you have a seamless and highly-efficient journey getting it done.

Dataset Category Link
Sentiment Analysis and Review Datasets Twitter US Airline Sentiment, Paper Reviews, Amazon Product Data,Multi-Domain Sentiment Analysis , Large Movie Review , Opin-Rank Review
Online Content Evaluation Datasets Spambase, Stop Clickbait, Hate Speech and Offensive Language
News Datasets The 20 Newsgroups, AG’s New Topic Classification, Reuters Text Categorization

If you have any other dataset, preprocess dataset to follow below format.

Sentence Label
Sample 1 label1
Sample 2 label2
Sample 3 label3
Sample 4 label4
Sample 5 label1

Model

In this repository, we have used following models for sentence classification

Tokenizer/Embedder Model Rank (based on Precision/Recall/f1-score)
Elmo Bi-Directional LSTM 2
word2vec LSTM/RNN 3
Bert CNN 1