
[Table 4 SST2 Task] BERT+LOVE Reproduction Issue

Closed this issue · 6 comments

I've been trying to reproduce the performance in the paper for the SST2 task using the 'BERT+LOVE' embedding you provided.
I tried changing various hyper-parameters in the model and modifying the code.
However, I failed to reproduce the performance of the paper.

My reproduction performance is below.

Could you provide the code that performed the SST2 Task?

Thank you!


Thanks for asking! A quick question, the repo offered two versions of LOVE and did you use the 768-dimensional model link?

Yes, I did the SST2 task using the 768-dimensional model you provided.

I will do the reproduction, and it may take some time

I appreciate your big support!!


I added some files for the reproduction of Table 4. Have a look at hereLOVE/extrinsic/bert_text_classification.
The data directory has all the data used in this experiment, including the samples with typos.
data/vocab.txt contains all words in this experiment, including typos. You need to download the prepared word embeddings generated by LOVE (love.emb), and put it to the data.

To reproduce scores for the original BERT:

python bert_plus_love.py --use_love False

To reproduce scores for BERT + LOVE:

python bert_plus_love.py --use_love True

We train the model by using five different learning rates, and record results of corresponding testing sets.

This is the average acc of five runs:

Model / typo rate 0 10 20 30 40 50 60 70 80 90
BERT 91.3 90.4 87.3 84.3 81.5 77.0 73.7 69.8 64.8 58.7
BERT+LOVE 91.0 89.6 87.4 85.1 83.0 79.4 75.5 71.6 68.0 61.3

This is the max acc among five runs:

Model / typo rate 0 10 20 30 40 50 60 70 80 90
BERT 91.6 90.9 87.8 85.0 82.0 77.5 74.6 70.6 66.0 59.3
BERT+LOVE 92.1 90.5 87.7 86.1 84.0 80.7 76.9 73.2 70.7 63.1

We can observe that adding LOVE can make BERT robust. The scores might be slightly different from the reported scores in our paper due to the following reasons:

  1. I left my university and the relevant code is missing. I have to rewrite all the code from the beginning based on my memory
  2. there is randomness when adding typos.

You can first run the code on datasets I provided to reproduce the score. If it works, then you can run it again on your constructed dataset.

Details of BERT results = [[0.91044887039239, 0.9023669738406659, 0.8746655766944115, 0.8439543697978598, 0.8151753864447087, 0.7754161712247325, 0.745671076099881, 0.6986846016646848, 0.6482424197384067, 0.5930625743162901], [0.9097986028537455, 0.9006391200951248, 0.8775824910820451, 0.8497696195005946, 0.8200245243757431, 0.769266498216409, 0.7405060939357908, 0.7054659631391201, 0.6593527051129607, 0.5932669441141498], [0.916375594530321, 0.9023669738406659, 0.8710054994054697, 0.8481532401902497, 0.8114038347205708, 0.7716446195005946, 0.7386667657550535, 0.7058003864447087, 0.6533145065398335, 0.5892910225921522], [0.9136816290130796, 0.9077549048751486, 0.8742382580261593, 0.8373773781212842, 0.8163644470868014, 0.7677615933412604, 0.733817627824019, 0.6937239892984541, 0.6403834720570749, 0.5845533590963139], [0.9147592152199762, 0.9088324910820451, 0.86777274078478, 0.8336058263971463, 0.8114038347205708, 0.7661452140309156, 0.7273521105826397, 0.6872584720570749, 0.6382282996432818, 0.5764714625445898]]

Details of BERT + LOVE results = [[0.9213362068965517, 0.9050609393579072, 0.8692776456599287, 0.8562351367419738, 0.8256354042806183, 0.7909296967895363, 0.735118162901308, 0.7092560939357908, 0.6522369203329369, 0.5920964625445898], [0.9087210166468489, 0.8947123959571938, 0.8731606718192627, 0.8607684304399524, 0.8391052318668253, 0.8035448870392391, 0.7661452140309156, 0.7256242568370986, 0.6916802913198573, 0.6305737217598097], [0.9109876634958383, 0.8969790428061831, 0.8743497324613555, 0.8455707491082045, 0.8230529131985731, 0.7829592746730083, 0.7528983353151011, 0.7090331450653984, 0.6736771700356718, 0.6126820749108205], [0.9005276456599287, 0.8883583531510106, 0.8742382580261593, 0.8439543697978598, 0.8205633174791914, 0.7866193519619501, 0.7506316884661118, 0.7023446789536266, 0.6736771700356718, 0.6034111177170035], [0.9065658442330559, 0.8969790428061831, 0.8774710166468489, 0.8493423008323425, 0.8404057669441142, 0.8065546967895363, 0.768839179548157, 0.7319782996432818, 0.706766498216409, 0.6283070749108205]]

Thanks for your help, really appreciate :)