Evaluationg performance of GPT based models (text-davinci-003, gpt-4 and gpt-3.5-turbo) for ranking and local model (sentence transformers)
Metric used: spearman rank correlation
Evaluation against gold rank that was provided by the hiwis.
More information in my local machine