dvlab-research/LongLoRA

Configs in inference.py necessary for context length expansion in model serving?

Opened this issue · 0 comments

In inference.py, there're two settings

    orig_ctx_len = getattr(config, "max_position_embeddings", None)
    if orig_ctx_len and args.context_size > orig_ctx_len:
        scaling_factor = float(math.ceil(args.context_size / orig_ctx_len))
        config.rope_scaling = {"type": "linear", "factor": scaling_factor}

and
model.resize_token_embeddings(32001)
Are they needed for the fine-tuned model with extended context length to work properly? For example, I finetuned the orignal Llama2 model to get a new context length of 16k, do I still need the settings for the model during inference? This is important since it will save us the hassle of writing custom inference code when using certain model-serving frameworks. We just tell the framework the model's save location.