How did you go about finetuning?
timothylimyl opened this issue · 9 comments
Hi there, I cannot seem to find any information on the fine-tuning process in your paper and this repository.
A snippet from your paper:
However, when finetuned with just 3,000 examples, ReAct becomes the best
method among the four, with PaLM-8B finetuned ReAct outperforming all PaLM-62B prompting
methods, and PaLM-62B finetuned ReAct outperforming all 540B prompting methods. In contrast,
finetuning Standard or CoT is significantly worse than finetuning ReAct or Act for both PaLM-
8/62B, as the former essentially teaches models to memorize (potentially halluincated) knowledge
facts, and the latter teaches models how to (reason and) act to access information from Wikipedia, a
more generalizable skill for knowledge reasoning.
Hi, what we did was to use prompted PaLM-540B to generate trajectories with correct answers to fine-tune PaLM-8/62B. These are not shared as PaLM is a Google-internal model, but you can similarly use GPT-3 trajectories to fine-tune.
@ysymyth I was asking for specific details of the fine-tuning process. For example, did you finetune the model by separating the data individually for every question, trajectories and answers? I guess you would this so that you will not mixed up the data during training right?
did you finetune the model by separating the data individually for every question, trajectories and answers?
Sorry, what do you mean exactly?
Example:
Input is given to model:
Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Targets that you use to fine-tune the model on:
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. So I need to look up eastern sector.
Action 2: Lookup[eastern sector]
Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of two distinct land regions:
Thought 4: I need to instead search High Plains (United States).
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft]
[ADD END TOKEN HERE]
Was it that you had 3000 examples of these inputs and targets pair for fine-tuning the model? I was just curious on how did you go about fine-tuning the model.
Was it that you had 3000 examples of these inputs and targets pair for fine-tuning the model?
Yes that's right --- we use prompted LLM to generate 3000 such examples with correct answers.
I see, thank you for the information!
It's very interesting to see knowledge distillation done for reasoning and action in language models. You are prompt engineering LLMs that has in-context learning (which is an emergent property of its size) to teach smaller models how to go about reasoning and acting. From your paper, I believe that REact be another important part of instruction tuning task that we can add when instruction-tuning smaller language models to perform as well as LLMs.
Yes, we believe fine-tuning is very promising --- can also check out Toolsformers (https://arxiv.org/abs/2302.04761) which is also about fine-tuning!
Thanks for the reply. All the best with your next project!
Hey, sorry for asking in a closed issue, but its very relevant to authors' question.
Is it enough to just have pairs (<Question>, <Thought, Action, Observation chain>)
in the dataset, or we also need to add intermediate pairs, like: (<Question, Thought, Action, Observation>, <Thought, Action, Observation>)
?
I haven't tried to fine-tune the model yet, but my guess is that the model will be able to start answering the question, but might be bad at REacting, i.e. taking previous observations into the consideration. Am I wrong?