GPU hang when switch between Llama2 and Llama3 on ARC770
Opened this issue · 1 comments
moutainriver commented
1)configure yaml to run llama2, input = 1K, launch all-in-one benchmark
ipex-llm/python/llm/dev/benchmark/all-in-one$ ./run-arc.sh
2)configure yaml to run llama3-instruct, input = 1K, launch all-in-one benchmark
GPU will hang when convert the llama3-instruct model.
if run 2) first and then run 1), GPU will hang, too.
qiuxin2012 commented
I can't reproduce your error, it works fine on my machine.
If you want to run both of them, please make sure your transformers is 4.37.x
My Arc770 is 16GB version, how about yours?
You can try test_api transformer_int4_fp16_gpu
to save 1.3GB memory than transformer_int4_gpu
.