booydar/babilong

Clarify GPT-4 model

rodion-m opened this issue · 1 comments

Please clarify what GPT-4 models us used in the benchmark (4o or Turbo) and also, please add them both into the benchmark.

Thanks for pointing out this lack of clarity, I'll update the readme and paper text. For main experiments we used the gpt-4-0125-preview, and for retrieval-augmented generation we employed the GPT-4-turbo version. Evaluation results for each model can be found in the predictions_06_2024 branch.