karpathy/llama2.c

Code Llama rope_theta parameter

janimo opened this issue · 2 comments

janimo commented

Base Llama models use 10000 for RoPE theta. Code Llama models use 1000000 for dealing with a larger context.
With the current hardcoded value, the code models tend to close extra parentheses, whereas they work better if it is changed to 1000000 in run.c

This may need to be added to Config but it would mean introducing incompatibility in the bin files (there are some other fields that may need to be there such as ffn_dim_multiplier which is 1.3 for the CodeLlama-13 models)

Or it can be solved at inference time only by adding a new rope_theta arg to run.c

SInce we are migrating to a new file format, should the RoPE tetha parameter be stored there?

yep exactly, the v1+ header is large enough to incorporate additional hyperparameters like this.