Need to fine-tune pretained models?
Closed this issue · 2 comments
Hi there,
I find this work very interesting, and I was trying to replicate your results using the models you've shared on Huggingface. The bi-encoder models are behaving as expected; however, the cross-encoders are getting much lower scores than I expect on STS (results in the 30s-40s rather than 70s to 80s), which makes me think I'm missing a step.
Should the Huggingface pretrained models for STS work out of the box, or do I need to fine-tune them on the train set for each STS dataset?
The models at issue are:
- trans-encoder-cross-simcse-roberta-base
- trans-encoder-cross-simcse-roberta-large
- trans-encoder-cross-simcse-bert-large
- trans-encoder-cross-simcse-bert-base
Thanks for any advice you can give!
Hi, thanks for your interest.
I wonder how you were evaluating the cross-encoders? Since it is of a different formulation, one needs to concatenate a sentence pair into one string and input it into the model. Specifically, you can use our script:
>> python src/eval.py \
--model_name_or_path "cambridgeltl/trans-encoder-cross-simcse-roberta-large" \
--mode cross \
--task sts_sickr
as mentioned in the readme (where mode
specifies whether to evaluate in bi-encoder or cross-encode formulation).
Hope this is helpful.
That was absolutely the problem; thank you!