Running forever when running Llama-3 model locally

Question

Running forever when running Llama-3 model locally

HaozhiTan opened this issue 5 months ago · 3 comments

Dear LLMWare Team,

I hope this message finds you well.

I'm trying to use model 'bartowski/Meta-Llama-3-8B-Instruct-GGUF' following the example https://github.com/llmware-ai/llmware/blob/main/examples/Models/using-llama-3.py, but it take forever to run and it looks like the script is stuck in the while loop in the generate function with below link. Could you please help to look into this.
https://github.com/llmware-ai/llmware/blob/main/llmware/models.py#L5561

Cheers

Answer 1 · 2024-05-19T10:45:38.000Z

@HaozhiTan - thanks for raising - we will look into it ....

Answer 2 · 2024-05-21T13:54:05.000Z

@HaozhiTan - quick update - we ran several tests with 'bartowski/Meta-Llama-3-8B-Instruct-GGUF' on several different platforms (Mac M1, Windows with CUDA, Linux with CUDA) and did not experience any issues - in particular, we ran through a series of prompts in this example. Each of our test environments had 32 GB of RAM - which is likely helping. What is your platform and system RAM? Also, could you please try a few other test inferences with the model (with shorter expected generations) to confirm if it is specific to that inference, or a more general issue? Please try the example script above - it will be a good way to do side-by-side comparison. If you share more details, we will keep working to try to recreate the problem you are experiencing.

Answer 3 · 2024-05-30T10:09:26.000Z

Closing the issue due to inactivity - and that model is working in our testing. @HaozhiTan - if you continue to have any issues with Llama-3 GGUF model, could you please raise a new issue and share your environment details.