Using the contextually-dependent embeddings obtained from BERT to assess the similarity of a candidate- and reference-text using cosine similarity. The main application of this metric is to assess the similarity between a candidate- and reference-summary.
The scoring algorithm consists of two central steps:
Obtaining embedding vectors from a pretrained BERT-based model.
Calculating the score using cosine similarity.
- Python version >= 3.6
- huggingface/transformers (https://github.com/huggingface/transformers)
- nltk (packages for tokenization for different languages)
- for Danish BERT model setup, follow the guide created by Daniel Varab Here: https://github.com/danielvarab/convert_da_bert