prdwb/bert_hae

Error can not find train_summay

WenTingTseng opened this issue · 5 comments

擷取

When I run the code hae.py in line 263 to 270. Why it always run to except block? So when it in the line 272 "train_summary_writer.add_summary(train_summary, step)" it can not find "train_summary"

How can I fix it? thanks a lot

prdwb commented

It seems that the training step fails, probably due to an empty feed dict. What is the length printed in the exception? Could you attach the running log? Thanks.

Features length printed in the exception is 6.
And I run the code hae.py like this
444
And the running log like this
333

due to an empty feed dict?Which dict?Is cache/quac?
My cache/quac dict like this
555

Thanks for your respond

prdwb commented

Hi, thank you for your snapshots. It seems that you have a "resource exhausted (out of memory)" error, probably because you are running on a CPU. This should be the reason that a training step fails. If you could refer to the following snapshot. Using a smaller batch size should resolve the issue. Note that a smaller batch size could hurt the performance.

image

I can see cache/quac is correctly generated. There is no issue with that. Thanks.

Hi, thank you for your respond.I change train_batch_size to 1 but it has problem like the log message
666
It seems that "resource exhausted (out of memory)" error cause the problem.(no sure)
How can resolve it . If I run the code on GPU.But how can change to run on gpu?
Thanks a lot

prdwb commented

It seems that the process was killed by the system without an error. So I agree that it could be due to memory issues. Using a smaller max_seq_length should further alleviate memory consumption but it will hurt the performance.

GPU usage will be handled by TensorFlow automatically, assuming you have set up the CUDA environment and installed the GPU version of TensorFlow. There is no need to modify the code. Thank you.