Use phrase-BERT for entity encoding

Question

Use phrase-BERT for entity encoding

Closed this issue 2 years ago · 3 comments

Add phrase BERT as an option for encoding the entities, before clustering:
https://aclanthology.org/2021.emnlp-main.846.pdf

Answer 1 · 2022-02-14T16:08:33.000Z

Should use the implementation mentioned in the paper https://github.com/sf-wa-326/phrase-bert-topic-model ?

Answer 2 · 2022-04-08T13:02:07.000Z

spaCy has a Roberta implementation: https://spacy.io/models/en (see en_core_web_trf).
relatio-v0.3 has support for spaCy embeddings, so this could be the easiest approach.

Answer 3 · 2022-04-13T14:12:04.000Z

SpaCy's en_core_web_trf model could be used with SBERT mean pooling operation to approximate the phrase-BERT embeddings, but this wouldn't benefit from the contrastive SBERT pre-training or phrase-BERT pre-training.

It would add dependencies to the project, but a straightforward solution may be to use the one uploaded to huggingface hub:

https://huggingface.co/whaleloops/phrase-bert

I'm experimenting with this implementation anyway, as transformers is already a dependency in my project.