cisnlp/simalign

ValueError when using NLLB model

Opened this issue · 2 comments

I tried using Meta's facebook/nllb-200-distilled-600M model, but it seems that hidden_states is not being set on the self.emb_model output (line 65). I'm getting:

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

Any suggestions for how to use NLLB?

Hi @jcuenod simalign mainly supports encoder-only models (like mBERT, XLM-R). Seems that for this model you would need to specify e.g., decoder_input_ids. A quick solution could be to feed sentence A to the encoder and sentence B to the decoder and then apply simalign to the similarity matrix. Feel free to create a PR to add this capability.

Thanks, I'll take a look at submitting a PR.