relatio-nlp/relatio

Use phrase-BERT for entity encoding

Closed this issue · 3 comments

Add phrase BERT as an option for encoding the entities, before clustering:
https://aclanthology.org/2021.emnlp-main.846.pdf

Should use the implementation mentioned in the paper https://github.com/sf-wa-326/phrase-bert-topic-model ?

spaCy has a Roberta implementation: https://spacy.io/models/en (see en_core_web_trf).
relatio-v0.3 has support for spaCy embeddings, so this could be the easiest approach.

SpaCy's en_core_web_trf model could be used with SBERT mean pooling operation to approximate the phrase-BERT embeddings, but this wouldn't benefit from the contrastive SBERT pre-training or phrase-BERT pre-training.

It would add dependencies to the project, but a straightforward solution may be to use the one uploaded to huggingface hub:

https://huggingface.co/whaleloops/phrase-bert

I'm experimenting with this implementation anyway, as transformers is already a dependency in my project.