idiap/HAN_NMT

Training HAN-joint using the sentence_level model

Closed this issue · 3 comments

The usage description shows that we can train the HAN-joint using the HAN-encoder model, but I wonder whether we can train the HAN-joint using the sentence level model? I tried this, but it seems that doesn't work.

In addition, I have another question. When I read the source code, I see some comments saying that the query and context are the same tensor during training. So how to use the previous sentence information? Could you show me where is the code in your source code?

Hi,
It should be possible to use HAN-joint with the sentence-level model. I did the experiments in this way but didn't get better results.
About the context, the referred tensor is the whole batch (tensor dimension: batch_size x sentence_length x hidden_state_dim). The batch contains whole documents and the sentences are in order (not shuffling of sentences). So to refer the context of a sentence at position i, the algorithm refer to the positions i-1, i-2,...,i-n (in the batch_size dimension).

in HAN_NMT/source/HierarchicalContext.py line 118-121

when I try to use HAN in encoder or decoder, it always shows, 'RuntimeError: CUDA error: out of memory', but the GPU is enough. I don't know how to solve this problem, can you give me some advice?

You can try to reduce the -batch_size and -max_batch_generator