/NLP_Projects

Repository for My HuggingFace Natural Language Processing Projects

Primary LanguageJupyter Notebook

Natural Language Processing

This repository houses Natural Language Processing (NLP) projects that I have completed (other than projects completed using Spark & Databricks).

HuggingFace Profile

To view and use the models, travel over my HuggingFace portfolio: huggingface.co/DunnBC22

Text Classification

Multiclass Classification
Project Name Model Checkpoint Accuracy Macro F1 Score Macro Precision Macro Recall
Apple iPhone SE Reviews* bert-base-uncased 0.9712 0.9561 0.9538 0.9598
Apple iPhone SE Reviews* microsoft/mpnet-base 0.9460 0.7242 0.7007 0.7594
CNN News Articles distilbert-base-uncased 0.9643 0.9640 - -
Hate & Offensive Speech* bert-base-uncased 0.9213 0.9161 0.9241 0.9144
Hate & Offensive Speech* bert-large-uncased 0.9869 0.9863 0.987 0.9857
Hate & Offensive Speech* distilbert-base-uncased 0.9607 0.9592 0.9613 0.9579
Hate & Offensive Speech* diptanu/fBERT 0.9607 0.9581 0.9596 0.9571
Hate & Offensive Speech* GroNLP/hateBERT 0.941 0.9351 0.951 0.9273
Password Strength microsoft/codebert-base 0.9975 0.9963 0.9948 0.9978
Malicious URLs microsoft/codebert-base 0.7279 0.4611 0.5436 0.4422
Malicious URLs microsoft/codebert-base-mlm 0.7322 0.4303 0.6034 0.4233
Malicious URLs microsoft/deberta-base-mnli 0.7353 0.4533 0.5684 0.4315
Malicious URLs (Using PEFT) roberta-large 0.7160 0.4374 0.5237 0.4190
Malicious URLs albert-base-v2 0.7267 0.4521 0.5508 0.4294
Binary Classification
Project Name Transformer Checkpoint Accuracy F1 Score Precision Recall
Malignant Comments - BERT-Base* bert-base-uncased 0.972 0.759 0.6918 0.8406
Malignant Comments - I-BERT* kssteven/ibert-roberta-base 0.9741 0.7773 0.7084 0.861
Mental Health Classification google/canine-c 0.9226 0.9096 0.9113 0.9079
OnionOrNot distilbert-base-uncased 0.9224 0.9218 - -
Spam Filter (Larger Dataset) distilbert-base-uncased 0.9845 0.9848 - -
Spam Filter (Smaller Dataset) distilbert-base-uncased 0.9907 0.9906 - -
Tweet About a Disaster or Not? - ALBERT* albert-base-v2 0.9138 0.7752 0.8204 0.7348
Tweet About a Disaster or Not? - DeBERTa* microsoft/deberta-v3-small 0.9050 0.7453 0.7453 0.7453
Tweet About a Disaster or Not? - DistilBERT* distilbert-base-uncased 0.9138 0.7752 0.8204 0.7348
Tweet About a Disaster or Not? - ERNIE* nghuyong/ernie-2.0-base-en 0.9156 0.7876 0.8436 0.7386
Tweet About a Disaster or Not? - ELECTRA* bhadresh-savani/electra-base-emotion 0.8857 0.7246 0.7991 0.6628
Tweet About a Disaster or Not? - RoBERTa* roberta-base 0.8989 0.7569 0.8211 0.7020
Multilabel Classification
Project Name Model Checkpoint Subset Accuracy F1 Score ROC-AUC
Go Emotions distilbert-base-uncased 0.2184 0.3328 0.6102
Research Articles distilbert-base-uncased 0.6977 0.8395 0.8909
Review Sentiments (with DistilBert) distilbert-base-uncased 0.5787 0.8697 0.9107
Review Sentiments (with Bert) bert-base-uncased 0.5967 0.8737 0.9146
Token Classification
Project Name Overall Accuracy Overall F1 Score Overall Precision Overall Recall Multilingual?
Babelscape WikiNeural Joined Dataset 0.994704 0.995886 0.995711 0.996060 Yes
BC2GM-IOB (EMBO-BLURB) 0.9736 0.7765 0.7521 0.8025 No
EMBO-BLURB with LoRA 0.9584 0.8136 0.7999 0.8278 No
DFKI-SLT/few-nerd 0.9498 0.8041 0.8203 0.7886 No
NCBI Disease 0.9825 0.8359 0.8064 0.8677 No
TNER Bio NLP 2004 0.9367 0.7169 0.6628 0.7805 No
Stromberg NLP - Twitter (SeqEval) 0.9860 0.9824 0.9828 0.9820 No
Stromberg NLP - Twitter PoS_v2 0.9853 0.8931 0.9296 0.8931 No
Stromberg NLP - Twitter PoS (SqueezeBERT Transformer) 0.9771 0.7765 0.8046 0.7785 No
WikiNeural - BERT-Base 0.9912 0.9145 0.9380 0.9261 No
WikiNeural - Amazon's BORT 0.9709 0.7050 0.7868 0.7437 No
WikiNeural - FNet-Base 0.8521 0.8934 0.8722 0.9853 No
WikiNeural - Funnel Transformer 0.9856 0.8722 0.9102 0.8908 No
WikiNeural - I-BERT-Base 0.9909 0.9107 0.9360 0.9232 No
WikiNeural - MEGA-Base 0.9619 0.6312 0.7324 0.6781 No
WikiNeural - RoBERTa-Base 0.9910 0.9124 0.9352 0.9237 No
WikiNeural - SqueezeBERT 0.9803 0.8278 0.8866 0.8562 No
WikiNeural - XLNet-Base 0.9904 0.9068 0.9324 0.9194 No
Sentiment Analysis
Project Name Model Checkpoint Accuracy Macro F1 Score Macro Precision Macro Recall
Emotions Sentiment Analysis distilbert-base-uncased 0.935 0.935 - -
Financial Sentiment Analysis - Original distilbert-base-uncased 0.8425 0.8470 - -
Financial Sentiment Analysis - Updated (v1.5) distilbert-base-uncased 0.8529 0.8564 - -
Financial Sentiment Analysis_v2 google/fnet-base 0.8117 0.7472 0.7588 0.7394
Financial Sentiment Analysis_v3 google/fnet-large 0.8618 0.8209 0.8084 0.8401
News About Gold - BORT* amazon/bort 0.8770 0.7791 0.8463 0.7539
News About Gold - BERT-Base* bert-base-uncased 0.9139 0.8758 0.8885 0.8647
News About Gold - Funnel* funnel-transformer/medium-base 0.9172 0.8854 0.8853 0.8859
News About Gold - MEGA* mnaylor/mega-base-wikitext 0.5014 0.3283 0.4548 0.3835
News About Gold - MPNet-Base* microsoft/mpnet-base 0.9068 0.8351 0.831 0.8406
News About Gold - SqueezeBERT* squeezebert/squeezebert-uncased 0.9168 0.8749 0.8822 0.8684
News About Gold - YOSO* uw-madison/yoso-4096 0.4456 0.2272 0.3240 0.2912
Twitter Sentiment Analysis distilbert-base-uncased 0.8466 0.8471 - -
Twitter Sentiment Analysis_v2 bert-base-uncased 0.8474 0.788 0.8132 0.7747
Twitter Sentiment Analysis_v3 vinai/bertweet-base 0.8588 0.8151 0.8463 0.7961
  • Metrics are Macro Averaged version only if all 4 metric values are displayed (accuracy, f1-score, recall, and precision).
Language Detection
Project Name Accuracy Macro F1 Score Macro Precision Macro Recall
Language Detection of Tweets 0.9992 0.9992 0.9992 0.9992
Language Detection- 10k 0.9971 0.9977 0.9981 0.9974
Language Detection-20k 0.9883 0.9882 0.9887 0.9879
Semantic Similarity
Project Name Accuracy F1 Score Precision Recall Average Precision
Semantic Similarity of Quora Pairs Dataset - Base 85.93 82.89 77.43 89.18 87.13
Semantic Similarity of Quora Pairs Dataset - Large 88.72 85.22 80.72 90.25 89.75
  • Metrics shown for Semantic Similarity are measured using Cosine-Similarity.

Text Generation/Multiple Choice/Question&Answer

Text Summarization
Project Name Rouge1 Rouge2 RougeL RougeLsum
Flan-T5 - Text Summarization-Data Dataset (1 Epoch) 43.6615 20.349 40.1032 40.1589
Flan-T5 - Text Summarization-Data Dataset (6 Epochs) 43.5994 0.4446 40.132 40.1692
LED - Text Summarization-Data Dataset (4 Epochs) 43.3689 19.9885 39.9887 40.0679
CNN News Text Summarization 0.834343 0.793822 0.823824 0.823778
Text Summarization BBC News (with Pegasus Transformer) 0.584474 0.463574 0.408729 0.408431
Machine Translation
Project Name Transformer Checkpoint Bleu Rouge1 Rouge2 RougeL RougeLsum Meteor
English to French facebook/mbart-large-50 35.1914 0.6420 0.4573 0.6070 0.6069 0.5917
English to German facebook/mbart-large-50 35.5931 0.5803 0.3939 0.5439 0.5442 0.55
English to Spanish facebook/mbart-large-50 41.4437 0.6751 0.4977 0.6372 0.6376 0.6479
BioMedical EN to IT Translation facebook/mbart-large-50 38.9893 0.6826 0.4737 0.6586 0.6585 0.6270
Chinese to English Translation Helsinki-NLP/opus-mt-zh-en 45.2808 0.6201 0.4198 0.5927 0.5927 -
Korean to English Helsinki-NLP/opus-mt-ko-en 14.3395 0.4391 0.2022 0.3671 0.3671 -
Medical - German to English Helsinki-NLP/opus-mt-de-en 53.8812 0.7664 0.6284 0.7370 0.7370 -
Question & Answer
Project Name Exact Match F1 Score
ML QA 59.6146 73.3002
Answer Prediction Dataset 65.7357 79.2835
Generate Docstrings
Project Name Model Checkpoint Rouge1 Rouge2 RougeL RougeLsum
CodeSearchNet Dataset to Generate Docstrings (Code T5 Project) Salesforce/codet5-small 0.3381 0.1541 0.3045 0.3214
Smol Dataset to Generate Docstrings Salesforce/codet5-base 0.4947 0.3661 0.4794 0.4791
Smol Dataset to Generate Docstrings Salesforce/codet5-small 0.38 0.2176 0.3554 0.3635
Multiple Choice
Project Name Accuracy
CosmosQA 0.6000
Social IQa 0.6128
Discourse Marker QA 0.6207
Figurative Language 0.8124
Strategy QA 0.625
e-CARE 0.7212
Vitamin C Fact Verification 0.7240
Winowhy 0.7118

NLP Regression

NLP Regression
Project Name Mean Squared Error (MSE) Root Mean Squared Error (RMSE) Mean Absolute Error (MAE)
Edmunds Car Reviews - All Brands (with Bert-Base) 0.2324 0.4820 0.3089
Edmunds Car Reviews - All Brands 0.2232 0.4724 0.3150
Edmunds Car Reviews - Brands Headquartered in America 0.2486 0.4986 0.3469
Edmunds Car Reviews - Brands Headquartered in Europe 0.1999 0.4471 0.2824
Edmunds Car Reviews - Brands Not Headquartered in America or Europe 0.2240 0.4733 0.3140
Episode Reviews/Rating - The Simpsons 0.7632 0.8736 0.6622
Episode Reviews/Rating - The Simpsons & Other TV Shows 0.3754 0.6127 0.4651
TMDB 5000 Move Dataset 0.7613 0.8725 0.6848

Language Modeling

Causal Language Modeling
Project Name Perplexity
2000 Clean Medical Articles 18.67
AG News (DistilGPT2 Version) 31.53
AG News (GPT2 Version) 22.92
US Economic News Articles 31.41
Causal Language Modeling for Chatbot
Project Name Perplexity
Large Company's FAQs (Medium) v1 8.67
Large Company's FAQs (Large) v1 2.79
Large Company's FAQs v2 1.70
Masked Language Modeling
Project Name Perplexity
AG News 5.95
Reddit Comments 12.70
US Economic News Articles 6.25

Footnotes:

  • The output format for rouge was changed somewhere along the way of training all of these projects. As a result, rouge metric values under 1 should be multiplied by 100 to accurately compare to the values over 1.
  • PoS stands for Part of Speech.
  • Projects that are apart of transformer comparisons using the same dataset are denoted with an asterisk (*) at the end of their project name.