Alternatively use Claude for annotation?
mensch72 opened this issue · 4 comments
Hi, thank you for your amazing work here!
I wonder if one could also use Anthropic/Claude for the annotators instead of OpenAI?
Hi,
We currently do not have support for Claude in this repo for the AlpacaFarm evaluation standard.
However, we do have a separate evaluation library based on AlpacaFarm available here that has support for Claude: https://github.com/tatsu-lab/alpaca_eval
There are some differences in the evaluation procedure between the two, which is covered here: https://github.com/tatsu-lab/alpaca_eval#differences-with-alpacafarm
Thank you for your fast response!
But I meant annotation not evaluation. Do you think I would be able to adapt the annotators in my fork on my own, maybe using the Claude-related code from AlpacaEval?
Or are annotators and evaluators more or less the same thing here?
Yup that could definitely work :)
Annotators and evaluators are almost the same there, if you're referring to the train/test split.
For collecting the preference data for training, we use the same set of pairwise evaluators, with the added step of flipping a random 25% of the labels (as detailed in the paper).
For doing the pairwise evaluation, we use the same evaluators/annotators but don't inject the extra noise.