/aggregation-data

Primary LanguageJupyter Notebook

Aggregation Data: LLM Inference Run

  • llm_judge_{task}.py: the script to run inference. I call it llm_judge as I set the evaluation prompt following LLM-as-a-judge (except MMLU), but technically it can be any inference task.
  • run*.sh: example of the actual commands to call the python scripts