How to run a multi-GPU model for inference testing?

Question

How to run a multi-GPU model for inference testing?

Closed this issue 5 months ago · 2 comments

llama3 70b

python3 automodel.py --name /home/models/Meta-Llama-3-70B/ --root-dataset humaneval --lang $lang --temperature 0.2 --batch-size 40 --completion-limit 1 --output-dir-prefix $output

thanks

Answer 1 · 2024-05-09T09:55:45.000Z

The following command line can solve the running problem.

Is there any difference in the effect between automodel.py and automodel_vllm.py?

python3 automodel_vllm.py --name /home/models/Meta-Llama-3-70B/ --revision main --num-gpus 2 --root-dataset humaneval --lang $lang --temperature 0.2 --batch-size 40 --completion-limit 1 --output-dir-prefix $output

Answer 2 · 2024-05-10T15:48:25.000Z

VLLM uses VLLM, whereas the other one uses transformers. I suggest using the VLLM script if it works for you.

Alternatively, use MultiPL-E from here: https://github.com/bigcode-project/bigcode-evaluation-harness