This notebook is for the 🤗Transformers doc.
Learn how to use powerful transformer models for text classification tasks!
Notebook example from Hugging Face - A notebook on how to finetune DistilBERT for text classification in TensorFlow. 🌎 https://huggingface.co/docs/transformers/model_doc/distilbert
Text classification using tokenization involves representing each document as a set of tokens or words. These tokens are often processed to remove stopwords, punctuation, and perform stemming or lemmatization.
For classifying opinions (pos,neg,neutral), the following sentiment analysis models were tested:
- distilbert/distilbert-base-uncased-finetuned-sst-2-english
- michellejieli/inappropriate_text_classifier
Later I will try to fine-tune my own version of BERT (instead of distilBERT) using the same dataset from MichelleJieli's model. Later I will compare and contrast the test error between different sentiment analysis models pretrained on the same dataset.
After the model is fine-tuned, it is possible to save the model using a command in the pytorch/tensorflow library and call it locally instead of referencing a cloud stored model on Hugging Face
Next, a study will be performed applying models from Google's model garden