harvardnlp/seq2seq-attn

multi-GPU

wanghechong opened this issue · 0 comments

i read you code carefully, but i do not vary understand like 'decoder_clones = clone_many_times(decoder, opt.max_sent_l_targ) ' , why we need to copy opt.max_sent_l_targ times, and we share the parameters and
do not share gradinput and gradoutput and others, what‘ s the logitic relation about them i do not understand clearly, so what kind of things we shoule clone? my model = {encoder(mlp), decoder} too, if i want to train my model , what i shoule notice? because i increase the compution in my model ,so i have to train it on two gpus to compare with others with the same hyper parameters.