This repo contains code/instructions for updating the visit-bench leaderboard. Link to leaderboard - https://huggingface.co/spaces/mlfoundations/VisIT-Bench-Leaderboard
- We need a model predictions csv file with 4 required columns: "instruction", "instruction_category", "image", "model_name", "prediction" where
model_name
is the model's name. Look at the existing files in thefull_model_predictions
. - Add this model prediction file to the
full_model_predictions
folder. Make sure the key used does not overlap with any of the existing keys. - Run
make_pairs.py
. This will generateall_pairs_with_references_original_set.jsonl
(all possible pairs, which we will not be running) and[model_name]_new_pairs_with_references.jsonl
(queries that we actually will run in step 5). - Unzip
gpt-4_cache.jsonl.zip
to get cachedgpt-4
judgments usingunzip gpt-4_cache.jsonl.zip
. - Run
OPENAI_API_KEY=[YOUR OPENAI KEY] python run_all_queries.py --leaderboard_jsonl leaderboard_submission_model_queries/[model_name]_predictions_new_pairs_with_references.jsonl (from step 3)
to add gpt-4 judgments with the new model into gpt-4_cache.jsonl
.
6. Run: python evaluate_all_queries.py
which will get the judgements gpt-4_head2head.json
.
7. Run:
python elo_analysis.py --head2head_file gpt-4_head2head.json
which will output the leaderboard.
Please send your model predictions
file and the zipped version of the updated gpt-4_cache.jsonl
, from Step 5 above, to the authors at yonatanbitton1@gmail.com
and hbansal10n@gmail.com
.