Configs in inference.py necessary for context length expansion in model serving?
Opened this issue · 0 comments
spring1915 commented
In inference.py, there're two settings
orig_ctx_len = getattr(config, "max_position_embeddings", None)
if orig_ctx_len and args.context_size > orig_ctx_len:
scaling_factor = float(math.ceil(args.context_size / orig_ctx_len))
config.rope_scaling = {"type": "linear", "factor": scaling_factor}
and
model.resize_token_embeddings(32001)
Are they needed for the fine-tuned model with extended context length to work properly? For example, I finetuned the orignal Llama2 model to get a new context length of 16k, do I still need the settings for the model during inference? This is important since it will save us the hassle of writing custom inference code when using certain model-serving frameworks. We just tell the framework the model's save location.