amzn/trans-encoder

Need to fine-tune pretained models?

Closed this issue · 2 comments

Hi there,
I find this work very interesting, and I was trying to replicate your results using the models you've shared on Huggingface. The bi-encoder models are behaving as expected; however, the cross-encoders are getting much lower scores than I expect on STS (results in the 30s-40s rather than 70s to 80s), which makes me think I'm missing a step.

Should the Huggingface pretrained models for STS work out of the box, or do I need to fine-tune them on the train set for each STS dataset?

The models at issue are:

  • trans-encoder-cross-simcse-roberta-base
  • trans-encoder-cross-simcse-roberta-large
  • trans-encoder-cross-simcse-bert-large
  • trans-encoder-cross-simcse-bert-base

Thanks for any advice you can give!

Hi, thanks for your interest.

I wonder how you were evaluating the cross-encoders? Since it is of a different formulation, one needs to concatenate a sentence pair into one string and input it into the model. Specifically, you can use our script:

>> python src/eval.py \
--model_name_or_path "cambridgeltl/trans-encoder-cross-simcse-roberta-large"  \
--mode cross \
--task sts_sickr

as mentioned in the readme (where mode specifies whether to evaluate in bi-encoder or cross-encode formulation).

Hope this is helpful.

That was absolutely the problem; thank you!