s348268281 opened this issue a year ago · 1 comments
I use mbp M1,when I run the code : ./run llama2_7b_chat.bin -m chat I waited for a long time without any response
I'm too low or my reasoning is slow?
I use mbp M1,when I run the code : ./run llama2_7b_chat.bin -m chat I waited for a long time without any response I'm too low or my reasoning is slow?
It seems that the cpu inference is really slow, and I waited for an hour to have a few responses