Aggregation Data: LLM Inference Run

llm_judge_{task}.py: the script to run inference. I call it llm_judge as I set the evaluation prompt following LLM-as-a-judge (except MMLU), but technically it can be any inference task.
run*.sh: example of the actual commands to call the python scripts

potsawee/aggregation-data