cisnlp/simalign

Incorporate LaBSE as a model option

Closed this issue · 3 comments

I modified simalign to use LaBSE (or "pvl/labse_bert") for underlying multilingual model to calculate embeddings. It showed better precision and recall on the alignments that either mBERT or XLM-RoBERTa and I think it would be a useful additional option for simalign.

Which language pairs (and directions) did you test calculating embeddings on?

That's a great suggestion - thanks for the pointer. It seems you already modified the simalign code? If yes, it would be great if you could create a PR. @masoudjs could review it and/or potentially help integrating it.

After looking through the library, I realized I didn't need to modify any code. All I needed to do to use LaBSE was:

myaligner = SentenceAligner(model="pvl/labse_bert", token_type="bpe", matching_methods="a", device="cuda")