Slow model inference when evaluation

Thanks for your wonderful work, which helps me a lot.

When using this tool, I met a very slow model inference. It will take a long time to do the evaluation especially on vqa-v2-full dataset with 214354 samples.

I noticed that the batch_size is fixed to 1 due to the bug in HF and cannot be increased.

vlm-evaluation/scripts/evaluate.py

Lines 55 to 56 in 098224f

    
           # Inference Parameters 
        
           device_batch_size: int = 1                      # Device Batch Size set to 1 until LLaVa/HF LLaMa fixes bugs!

I am wondering whether there is any other way to speed up the inference on evaluation.

We’re working on adding robust support for batched inference to speed up workloads like this one. Unfortunately, VQA-v2 just has a massive validation set — if you’re iterating on experiments, I’d consider trying the “subsampled” variant of the dataset (16K examples), and only running out the full 200K for getting final numbers!

	# Inference Parameters
	device_batch_size: int = 1 # Device Batch Size set to 1 until LLaVa/HF LLaMa fixes bugs!