arcee-ai/DALM

Incorrect pooling BGE model

Serega6678 opened this issue · 1 comments

BGE models (the default option) require CLS pooling and not the mean pooling:
https://huggingface.co/BAAI/bge-large-en#frequently-asked-questions

While in the actual code, the mean pooling is the default option:
https://github.com/arcee-ai/DALM/blob/main/dalm/models/retriever_only_base_model.py#L60

Is this a bug or an expected behaviour?

Hi @Serega6678 , We use mean pooling by following other sentence transformer use cases. For example, we saw better results when using mean pooling with the E5 family. But feel free to send us PR by adding the functionality to select whether to use the mean pooling.