Question about Prompt Tuning Process: Seeking Insights from Authors
andreasbinder opened this issue · 1 comments
Hi again! :)
I really like the RAP framework and especially the GSM8k experiment which I would like to extend! Basically, what I want to achieve is to have more actions to tackle a multi-modal dataset which includes retrieval.
However, when changing the prompt, the model (TheBloke/Llama-2-13B-GPTQ
and meta-llama/Llama-2-13b-hf
) fail completely.
I want to ask you about your experiences when it comes to questions like, how sensitive is the model if you reduce the number of demonstrations, or where to put the overall question etc.
Problems that I encounter are for example:
- Giving the overall question and asking to create a subquestion, it just rephrases the question with no decomposition
- When I do not give demonstration but instead formulate a query combining the overall question and the task description (
"Generate a textual query for finding the university that started offering courses in the community with ZIP code 29707 in August 2018.\n”
), the model continues with the description ('The goal of this project is to generate a textual query in SQL that would find the university that started offering courses in the community with ZIP code 29707 in August 2018.\n'
or'We assume that you have the data in the following column.\n'
)
I am aware those are more general questions and not totally specific to your code, but I was curious how much effort I have to invest to guide the model in a meaningful capacity (I hope not to just spend weeks prompt engineering but also actually implement some actions :D )
Great Work btw!
Hi, thanks for your question.
The vanilla Llama model is trained to predict the next tokens, instead of following instructions, and that's why we need to prompt it with a few demonstrations. I believe instruction-fined LLMs, e.g., Alpaca, Vicuna, Llama2-chat, etc. should fit your need.