Incorporate LaBSE as a model option

Question

Incorporate LaBSE as a model option

Closed this issue 2 years ago · 3 comments

I modified simalign to use LaBSE (or "pvl/labse_bert") for underlying multilingual model to calculate embeddings. It showed better precision and recall on the alignments that either mBERT or XLM-RoBERTa and I think it would be a useful additional option for simalign.

Answer 1 · 2022-05-17T10:13:21.000Z

Which language pairs (and directions) did you test calculating embeddings on?

Answer 2 · 2022-06-18T10:42:25.000Z

That's a great suggestion - thanks for the pointer. It seems you already modified the simalign code? If yes, it would be great if you could create a PR. @masoudjs could review it and/or potentially help integrating it.

Answer 3 · 2022-06-20T01:59:44.000Z

After looking through the library, I realized I didn't need to modify any code. All I needed to do to use LaBSE was:

myaligner = SentenceAligner(model="pvl/labse_bert", token_type="bpe", matching_methods="a", device="cuda")