taoshen58/BiBloSA

Why is the performance difference between thesis and code?

janguck opened this issue · 21 comments

I am wondering if there is a parameter that is as good as the thesis of the performance of the trec dataset!

I remember that I used Adadelta optimizer with init learning rate 0.5; the dropout keep probability was set to 0.5

Recently, my new model achieved a new SoTA result on TREC with dev accuracy 95.2 by using the hyper-parameters above. The code is released at https://github.com/taoshen58/DiSAN/tree/master/Fast-DiSA

I just verified that model with dropout keep prob of 0.55 can achieve the dev accuracy 95.0% on trec within 40K steps. You can reduce the number of eval_period for more frequent validation.

Thank you for answer. However, I have tried 100 and 200 eval period for given parameter, but do not go above 93.6.
Excuse me, can you tell me more detailed parameter?

The only hyper-param I tuned is the dropout keep probability.

Try to run code by

python3 qc_main.py --network_type context_fusion --context_fusion_method block --model_dir_suffix run_qc --gpu 0 --dropout 0.55 --num_steps 50000 --fine_grained False --eval_period 50

--network_type context_fusion ?? occur error
--network_type exp_context_fusion ?? Is that right?

yes, there are some little running parameter differences between the local codes and those in github.

Thank you. If so, is it possible that other codes have changed?

No, the model files in both the local and github are identical. By the way, when this morning I tested the code, TF1.4.1 cuda8 cudnn6 was used in single gtx 1080ti gpu. And I achieved 94.8% around 20k and 95.0% around 35k.

Thank you. I got the results I wanted.

I thought glove.840M was always better than glove.6M, but it was not. Thank you for your kind reply.

You're welcome. Very glad to help you with the codes. I empirically found that GloVe.6B is adequate for most low-level NLP tasks and its fine-tuning is efficient.

I think you can try and improve the Fast-DiSA I mentioned first to get new SoTA results on a wide range of NLP tasks. As far, I have tested it on language inference, semantic role labeling and sentence classification tasks, which shows more excellent performance and higher efficiency.

You can apply the fast-disa codes to the corresponding NLP task project I have released if familiar with the project structure. And I will make projects codes with fast-disa public by the end of May~

Good luck ~
Tao

I tried to experiment with data like 20news, mr, but I get gpu memory error. Is it possible to experiment with long data set? I will only ask you to experiment with a relatively short dataset.

What's the meaning of I will only ask you to experiment with a relatively short dataset?

Are lengthy datasets a difficult model to experiment with?

Got you.
Multi-dim self-attentions intrinsically suffer from the memory problem. Alternatively, you can read the paper in which I solved this problem.

Hello,I have a question. How could you get the 'GPU Memery' in your paper? It has confused me for a long time. I hope that you help me. Thank you very much !

I used geforce 1080ti

I used geforce 1080ti
Thank you for your answer.In fact ,I do not know how to get the value of Gpu Memory .For example ,in paper ,the training gpu memory of multi-cnn is 529, bi-sru is 731. I have no idea how to get this value.
Could you give me some advice?
Thank you !

@waguoxiaohai There are two approaches:

  1. During the training phase, set the gpu memory config as soft placement, and record the gpu memory shown in the nvidia-smi
  2. using the meta data in tensorflow and analysis the assigned tensors and variables in the run_metadata. The following is a simple example.
sess = tf.Session()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
# put your running code
sess.run(xxx, feed_dict, options=run_options, run_metadata=run_metadata)

@waguoxiaohai There are two approaches:

  1. During the training phase, set the gpu memory config as soft placement, and record the gpu memory shown in the nvidia-smi
  2. using the meta data in tensorflow and analysis the assigned tensors and variables in the run_metadata. The following is a simple example.
sess = tf.Session()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
# put your running code
sess.run(xxx, feed_dict, options=run_options, run_metadata=run_metadata)

Thank you for your answer . I choose to use the first approch. I use the multi-cnn model for the SNLI (train_batch_size 64).
But the nvidia-smi shows it is 2431MiB which is not the same as 529MB in the paper .
It has confused me several days...

It seems that the minimum GPU memory is 2000+ MB after some update of the tensorflow and cuda. Try to downgrade your version to what the paper mentioned to get the result.

It seems that the minimum GPU memory is 2000+ MB after some update of the tensorflow and cuda. Try to downgrade your version to what the paper mentioned to get the result.

OK.I will do as you suggest. Your advice is very important to me。
Thank you very much!