We can't get the 3 improvement rate

Question

We can't get the 3 improvement rate

figsyn opened this issue 5 years ago · 5 comments

Hi,
We tried PC datasets and subj datasets with number of 500, and run the e_2_rnn_baseline. py and aug. py in experiment 'e'. And our augmentation number is 16. However ,the results are not stable, sometimes lower than baseline, and we didn't get the 3 improvement rate. We want to know what parameters you use in your experiments. Thanks a lot !

Answer 1 · 2019-11-05T09:34:17.000Z

How many random seeds did you run?
Did you shuffle the training data to get 500 examples? Are your classes balanced? Did you use the preprocessing script that i used?
How did you get your test sets?
Are you sure that you have formatted your data correctly?
For the 3 percent improvement, I used the default params.
If you look at Figure 1 in the paper, you'll see that SUBJ improvement can be marginal.
If your results are unstable, you can do n_aug=8 or 10.

Answer 2 · 2019-11-05T11:29:21.000Z

thanks your answer!
First, we used random seeds are 0 to 4.
We got 500 examples with sample funtion, and our classes are balanced,and we used e_2_rnn_baseline.py and e_2_rnn_aug.py.
Our test sets are full test sets, and we got our datasets from your #9 reply and the code of your preprocess. But the number of our datasets has a little difference with your paper shown.
And I have another question, the cr datasets we didn't find the test, do you get train and test datasets form full datasets?

Answer 3 · 2019-11-05T12:49:36.000Z

I randomly took 10% of CR for the test set.
Those seem right to me, I don't know why you're getting that. What are your actual numbers?
When I ran my experiments, often I would see variation in the results, which could fluctuate up to 2 or 3% in some cases. But these were averaged out over many experiments for many random seeds.
I checked all the results I ran previously for SUBJ and the improvement was indeed not that great (~1%).
For PC, however, the results are around 3%.
You can see from Figure 1 in the paper that the most improvement was on the TREC dataset.

Answer 4 · 2019-11-07T07:21:59.000Z

Thanks your reply!

Answer 5 · 2019-11-07T13:02:18.000Z

Closing this issue. If you can't get it to work, just let me know and we can discuss further.