Why is the performance difference between thesis and code?

Question

Why is the performance difference between thesis and code?

janguck opened this issue 6 years ago · 21 comments

I am wondering if there is a parameter that is as good as the thesis of the performance of the trec dataset!

Answer 1 · 2018-05-07T12:57:23.000Z

I remember that I used Adadelta optimizer with init learning rate 0.5; the dropout keep probability was set to 0.5

Recently, my new model achieved a new SoTA result on TREC with dev accuracy 95.2 by using the hyper-parameters above. The code is released at https://github.com/taoshen58/DiSAN/tree/master/Fast-DiSA

Answer 2 · 2018-05-08T01:33:21.000Z

I just verified that model with dropout keep prob of 0.55 can achieve the dev accuracy 95.0% on trec within 40K steps. You can reduce the number of eval_period for more frequent validation.

Answer 3 · 2018-05-08T06:42:24.000Z

Thank you for answer. However, I have tried 100 and 200 eval period for given parameter, but do not go above 93.6.
Excuse me, can you tell me more detailed parameter?

Answer 4 · 2018-05-08T06:48:05.000Z

The only hyper-param I tuned is the dropout keep probability.

Try to run code by

python3 qc_main.py --network_type context_fusion --context_fusion_method block --model_dir_suffix run_qc --gpu 0 --dropout 0.55 --num_steps 50000 --fine_grained False --eval_period 50

Answer 5 · 2018-05-08T07:04:11.000Z

--network_type context_fusion ?? occur error
--network_type exp_context_fusion ?? Is that right?

Answer 6 · 2018-05-08T07:06:13.000Z

yes, there are some little running parameter differences between the local codes and those in github.

Answer 7 · 2018-05-08T07:08:49.000Z

Thank you. If so, is it possible that other codes have changed?

Answer 8 · 2018-05-08T07:15:54.000Z

No, the model files in both the local and github are identical. By the way, when this morning I tested the code, TF1.4.1 cuda8 cudnn6 was used in single gtx 1080ti gpu. And I achieved 94.8% around 20k and 95.0% around 35k.

Answer 9 · 2018-05-08T09:35:17.000Z

Thank you. I got the results I wanted.

I thought glove.840M was always better than glove.6M, but it was not. Thank you for your kind reply.

Answer 10 · 2018-05-08T10:08:45.000Z

You're welcome. Very glad to help you with the codes. I empirically found that GloVe.6B is adequate for most low-level NLP tasks and its fine-tuning is efficient.

I think you can try and improve the Fast-DiSA I mentioned first to get new SoTA results on a wide range of NLP tasks. As far, I have tested it on language inference, semantic role labeling and sentence classification tasks, which shows more excellent performance and higher efficiency.

You can apply the fast-disa codes to the corresponding NLP task project I have released if familiar with the project structure. And I will make projects codes with fast-disa public by the end of May~

Good luck ~
Tao

Answer 11 · 2018-05-09T04:52:05.000Z

I tried to experiment with data like 20news, mr, but I get gpu memory error. Is it possible to experiment with long data set? I will only ask you to experiment with a relatively short dataset.

Answer 12 · 2018-05-09T04:54:40.000Z

What's the meaning of I will only ask you to experiment with a relatively short dataset?

Answer 13 · 2018-05-09T04:58:01.000Z

Are lengthy datasets a difficult model to experiment with?

Answer 14 · 2018-05-09T05:02:17.000Z

Got you.
Multi-dim self-attentions intrinsically suffer from the memory problem. Alternatively, you can read the paper in which I solved this problem.

Answer 15 · 2018-12-14T10:56:15.000Z

Hello,I have a question. How could you get the 'GPU Memery' in your paper? It has confused me for a long time. I hope that you help me. Thank you very much !

Answer 16 · 2018-12-15T17:28:28.000Z

I used geforce 1080ti

Answer 17 · 2018-12-16T09:07:33.000Z

I used geforce 1080ti
Thank you for your answer.In fact ,I do not know how to get the value of Gpu Memory .For example ,in paper ,the training gpu memory of multi-cnn is 529, bi-sru is 731. I have no idea how to get this value.
Could you give me some advice?
Thank you !

Answer 18 · 2018-12-16T09:25:59.000Z

@waguoxiaohai There are two approaches:

During the training phase, set the gpu memory config as soft placement, and record the gpu memory shown in the nvidia-smi
using the meta data in tensorflow and analysis the assigned tensors and variables in the run_metadata. The following is a simple example.

sess = tf.Session()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
# put your running code
sess.run(xxx, feed_dict, options=run_options, run_metadata=run_metadata)

Answer 19 · 2018-12-16T12:20:58.000Z

@waguoxiaohai There are two approaches:

During the training phase, set the gpu memory config as soft placement, and record the gpu memory shown in the nvidia-smi

using the meta data in tensorflow and analysis the assigned tensors and variables in the run_metadata. The following is a simple example.
sess = tf.Session()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
# put your running code
sess.run(xxx, feed_dict, options=run_options, run_metadata=run_metadata)

Thank you for your answer . I choose to use the first approch. I use the multi-cnn model for the SNLI (train_batch_size 64).
But the nvidia-smi shows it is 2431MiB which is not the same as 529MB in the paper .
It has confused me several days...

Answer 20 · 2018-12-16T12:27:34.000Z

It seems that the minimum GPU memory is 2000+ MB after some update of the tensorflow and cuda. Try to downgrade your version to what the paper mentioned to get the result.

Answer 21 · 2018-12-16T12:32:14.000Z

It seems that the minimum GPU memory is 2000+ MB after some update of the tensorflow and cuda. Try to downgrade your version to what the paper mentioned to get the result.

OK.I will do as you suggest. Your advice is very important to me。
Thank you very much!