Validation score on WSC decreases with training

Question

Validation score on WSC decreases with training

Closed this issue 2 years ago · 3 comments

Thank you for the amazing work on t-few! I've noticed strange behavior when I am running superglue's wsc. I've been logging the validation score every 40 epochs using self.eval_epoch_interval = 40 and when running the command:
python -m src.pl_train -c ia3.json+wsc.json -k save_model=False exp_name=first_exp the output is as following:

{"accuracy": 0.6730769230769231, "score_gt": 0.5068197436630726, "score_cand": 0.7191649047801127}
{"accuracy": 0.49038461538461536, "score_gt": 1.4563168384707892, "score_cand": 1.505529030584372}
{"accuracy": 0.47115384615384615, "score_gt": 3.4743554890155792, "score_cand": 2.727144861450562}
{"accuracy": 0.46153846153846156, "score_gt": 4.202766236777489, "score_cand": 3.5702959763316007}
{"accuracy": 0.40384615384615385, "score_gt": 5.157541000499175, "score_cand": 3.5657502871293287}
{"accuracy": 0.3942307692307692, "score_gt": 5.397989429533482, "score_cand": 3.975659689651086}
{"accuracy": 0.40384615384615385, "score_gt": 5.073869264469697, "score_cand": 3.995581218542961}

The last accuracy score is reported at 240 epochs out of a total 250 epochs.

Any ideas on what is going on here? Thanks!

Answer 1 · 2022-08-29T03:30:45.000Z

I can try running this experiment maybe later half of this week. Meanwhile, I remember WSC to be a tricky dataset that often produces unstable results. Would you mind running it with a few other seeds and seeing if this behavior persists?

Answer 2 · 2022-08-29T13:39:40.000Z

And btw, is this just WSC? Do other datasets have this problem?

Answer 3 · 2022-08-29T22:07:59.000Z

Hi Haokun, thank you for the response. Indeed, after changing the seed results are more as expected. I have been having similar problems with WiC, but again it appears to be caused by the variability of the seed. RTE seems more stable.