dvlab-research/LongLoRA

What's the trainset is used to obtain “Model with contextg extension via improved LoRA fine-tuning” (LoRA+)?

ZackZikaiXiao opened this issue · 0 comments

Hi, thanks for the great work. I have a question regarding the used trainset for different types of models (Fully fine-tuned, Lora+, models for extra experiments in paper).

In the ReadMe, it states, "There is no need to make supervised fine-tuning upon the fine-tuned context extended models. It is all right to directly use the base model as Llama2-chat models, as the amount of long instruction following data is enough for SFT." While in the paper, Figure 5's caption suggests that Lora+ is trained with RedPajama.

I'm seeking clarification on the following points:

  1. Do the released models refer to those that have undergone unsupervised fine-tuning on RedPajama and then tested on PG19?
  2. Is Table 9, which evaluates the LongBench benchmark, the only one involving supervised fine-tuning with LongAlpaca-12k based on models fine-tuned with RedPajama?
  3. Where can I find the performance of using only LongAlpaca-12k to derive the Lora adapter, embeds, and norm layer?
RedPajama (unsupervised) LongAlpaca-12k (supervised)
Fully fine-tuned (readme)
Lora+ (readme)
Models for LongBench benchmark (paper)

I've drafted a table to summarize my understanding of the training configurations mentioned in both the ReadMe and the paper. Could you please confirm if this representation is correct?