Can download the source code using git clone
or the zip file
.
We proffer a novel annotated dataset comprising 5 levels of politeness : (1) Highly Impolite, (2) Impolite, (3) Neutral, (4) Polite, and (5) Highly Polite. The dataset consists of 2500 review sentences
, encompassing levels of politeness tone intensity of peer reviews accrued from various multi-disciplinary venues like NIPS, ICLR, Publons, and ShitMyReviews. However, till the paper is under internal review we have published only 500 review sentences
from our dataset for reference purposes only.
NOTE: We have also stored the embeddings to expedite the training process. Since the entire dataset sums upto 68 MB, we have uploaded it HERE
This notebook consists all the code for EDA like data cleaning, resolving data imbalance by upsampling, ohe-hot-encoding the y-labels and finally storing the embeds.
This notebook consists different variants of our competitve baselines analysis, wherein we feed HateBERT/SciBERT/ToxicBert Embeddings (either of them at a time). In the notebook uncomment
the appropriate name
and embed_model_name
, the one that you want to reproduce and let the others be commented.
This notebook
consists our defined Embedding layer using word2vec. Our defined Embedding layer returns a 300 dimensional embedding vectoer which passes it to BiLSTM. NOYE: Make sure is_BiLSTM = True
while running the notebook. Also, load the Embedding-Matrix.pickle
link to load the weights for our custom embeddings.
This notebook depicts Fleiss Kappa, Krippendroff Alpha, and Cohen Kappa scores, suggesting how well multiple annotators have annoatated the review following the proposed annotation guidelines.
a) Change the URL PATH
accordingly before loading the dataset(pickle files)
b) The 1st tab in the notebook consists all the additional dependencies required and will be downloaded on running the cell
c) For SAVE_PATH
set the URL path where you want to save the trained model
Once all the setup is complete then execute run all
.