SigmaWe/DocAsRef

Vectorizing MNLI sentence segmentation and similarity computation

forrestbao opened this issue · 1 comments

The code below does sentence segmentation one sentence each time. Please use a vectorized version that segments multiple sentences each time.

Also, we should have two versions for MNLI-based approaches: word-level like original BERT-score and sentence-level like BERT-score sentence.

https://github.com/SigmaWe/DocAsRef_0/blob/de4de4b4275e661621bebf3b2f92d8676e2f81c2/mnli/eval.py#L14-L26

Please spend up to 1 hour to figure out why using MNLI based model to compute sentence similarity is so slow.