intel-analytics/ipex-llm

GPU hang when switch between Llama2 and Llama3 on ARC770

Opened this issue · 1 comments

1)configure yaml to run llama2, input = 1K, launch all-in-one benchmark
ipex-llm/python/llm/dev/benchmark/all-in-one$ ./run-arc.sh
2)configure yaml to run llama3-instruct, input = 1K, launch all-in-one benchmark
GPU will hang when convert the llama3-instruct model.

if run 2) first and then run 1), GPU will hang, too.

I can't reproduce your error, it works fine on my machine.
If you want to run both of them, please make sure your transformers is 4.37.x
My Arc770 is 16GB version, how about yours?
You can try test_api transformer_int4_fp16_gpu to save 1.3GB memory than transformer_int4_gpu.