Setting seq-to-seq model as our pretrained model
b3ade opened this issue · 6 comments
Is it possible to load seq-to-seq model to make word alignments with this work?
I'm stuck on getting proper out_src and out_tgt layers to work with for next step.
I know its is for mBERT only implementation, but I'm trying to see if is it possible to work on same way on seq-to-seq models.
If you have any hint in what direction I need to go or code to share, please do.
This is not paper work, I'm just curious.
Hi, I didn't try this before but I think our method can be directly applied to the encoder of a seq2seq model. Also, for MT models, we can use both its encoder and decoder to obtain word embeddings in the source and target sides and extract word alignments using our method, and we can train both src2tgt and tgt2src models so that we can obtain alignments in two directions and combine them in some ways (e.g. taking the intersections).
I see thanks for reply. If I understood correctly, the problem is how to apply it to the encoder of the seq2seq model.
More precisely I'm trying to load nllb-200-distilled-600M model , but I cant get out right layers needed for further calculations.
with torch.no_grad():
print(ids_src.unsqueeze(0))
test_src=model.generate(ids_src.unsqueeze(0), forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"], max_length=30)
print(test_src)
out_src = model(ids_src.unsqueeze(0), output_hidden_states=True)[2][align_layer][0, 1:-1]
Output:
tensor([[ 81955, 248105, 739, 7819, 248, 81955, 835, 2, 256047]])
tensor([[ 2, 256047, 119167, 248105, 739, 7819, 248, 81955, 835,
2]])
Traceback (most recent call last):
File "insertWordAlignServer.py", line 39, in <module>
out_src = model(ids_src.unsqueeze(0), output_hidden_states=True)[2][align_layer][0, 1:-1]
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1315, in forward
outputs = self.model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 1206, in forward
decoder_outputs = self.decoder(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 985, in forward
raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds
Probably not correct way to call model, but not sure how to manage it.
Hi, if you are using an MT model like nllb, given a sentence pair (x, y), you can obtain contextualized word embeddings for x and y by:
- feeding <x, y> to nllb and using its encoder to get embeddings for x and its decoder to get embeddings for y, or
- feeding <y, x> to nllb and using its encoder to get embeddings for y and its decoder to get embeddings for x, or
- feeding <x, whatever> and <y, whatever> to nllb and using its encoder to get embeddings for x and y.
Then, you can extract alignments from the embeddings using our methods.
I haven't used nllb before, but the error seems to be that you didn't feed any inputs to the decoder.
I have also been trying to do word alignment with seq-2-seq models. @zdou0830 How do you pick the appropriate alignment layer when changing out models?
I have also been trying to do word alignment with seq-2-seq models. @zdou0830 How do you pick the appropriate alignment layer when changing out models?
I think you can do zero-shot evaluation on a dev set (e.g. examples in https://github.com/neulab/awesome-align/tree/master/examples) and see which layer performs the best.
Interesting, thanks.