How to run the model?

Question

How to run the model?

karpathy opened this issue 2 months ago · 4 comments

Hi, I believe there are docs missing on how to actually run the model once you download it? E.g. I followed the instructions and downloaded the 3.1 8B (base) model into the models/llama3_1/Meta-Llama-3.1-8B/ directory, but it's not clear what to do next. I'm guessing you'd want to load the params.json, init the ModelArgs with it, init the Transformer, load the params from consolidated.00.pth and torchrun that?

I'm guessing it would be along the lines of what exists in the llama3 repo (e.g. example_text_completion.py), which I am a bit hesitatant to build on given the notice about it being deprecated.

Answer 1 · 2024-08-12T14:10:15.000Z

Just adding to what @karpathy said. It seems an oversight to give instructions on how to download the model locally, only to then instruct us to use the hugging face downloads. This is confusing for someone unfamiliar with llama, and appears to be something missing. We want a sample python script that leads us to run the model locally that we just downloaded.

Answer 2 · 2024-08-13T21:13:16.000Z

When can we expect an update on this?
On this prompt format page they say " Note that although prompts designed for Llama 3 should work unchanged in Llama 3.1"

Would that mean those example scripts would work the same? or not?

The "example_chat_completion.py" calls "chat_completion" from "generation.py" which in turn calls "encode_dialog_prompt".
And Dialog is list of Json/dictionary with role and content.

No such example inference code has been provided with 3.1 models

Answer 3 · 2024-08-13T23:12:31.000Z

I just got off work I'm going to clean up and I will go thru your format

…

On Tue, Aug 13, 2024, 4:13 PM seg-aiaec ***@***.***> wrote: When can we expect an update on this? On this prompt format page <https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/> they say " Note that although prompts designed for Llama 3 should work unchanged in Llama 3.1" Would that mean those example scripts would work the same? or not? The "example_chat_completion.py" calls "chat_completion" from "generation.py" which in turn calls "encode_dialog_prompt". And Dialog is list of Json/dictionary with role and content. — Reply to this email directly, view it on GitHub <#82 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BJ6UO367QCGS7ELYMQ2SHSTZRJZIPAVCNFSM6AAAAABLZQANFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBXGE2DQMBRGA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 4 · 2024-08-14T15:00:48.000Z

thanks @karpathy for opening this issue, we are targeting llama-stack as our preferred path for inference, please take a look at that, we would also appreciate your feedback on the RFC as well.

To provide alternatives, we are also updating the instructions to run our previous style of example as well soon. This PR should solve the issue.