PTG using same key

Question

PTG using same key

minji-o-j opened this issue a year ago · 21 comments

In the paper, k^c_z (cluster key) and k^p_t (prompt key) exist, respectively, but the code uses the same key.

Are the cluster keys and prompt keys written in the paper actually the same thing?
In what part of the code are clusters considered?
Currently the key doesn't appear to be cluster specific as it is updated with every full data.

 prompt_embeds = self.lam * self.MHA(task_query, key, value) + (1 - self.lam) * self.MHA(input_query, key, value)

StevenTang1998 commented a year ago

yeah

Answer 1 · 2023-07-11T03:38:15.000Z

I have replied your questions in your email. Thanks for your questions!

Answer 2 · 2023-07-21T06:55:45.000Z

@StevenTang1998
Hello, I'd like to ask you additional questions separately from the mail.

What is learned in the PTG training process?

only task query and keys
BART fine-tuning + task query and keys

When I first read the paper, I thought it was 1,

I'm confused because the code says self.model.requires_grad_(True) and the paper says the learning rate of "BART".

Could you give me an answer as to which one is correct?

Answer 3 · 2023-07-21T07:06:14.000Z

Hi, @minji-o-j during prompt pre-training, we only train the query and keys. When fine-tuning on the downstream tasks, we tune the prompts and the BART model. More details can be found Page 6 in our paper.

Answer 4 · 2023-07-21T07:22:38.000Z

Then, is the process of learning a PTG for a specific task a two-stage process?

(1) In the process of obtaining tilde p, query and keys are trained using the Frozen BART model.
(2) Fine-tuning BART using tilde p obtained in stage (1)

Answer 5 · 2023-07-21T07:30:38.000Z

Yes, and the (1) is optional if you use existing trained prompts.

Answer 6 · 2023-07-21T07:57:02.000Z

If so, it is impossible to obtain the paper's experimental results immediately by executing the following command in the current code, and is it correct to reproduce it through minor modifications?

python run_textbox.py --model=PTG --dataset=cnndm --model_path=facebook/bart-large
(I used the command written here)

The reason for thinking so is that when learning queries and keys in PTG,
self.model.requires grad_ is set to True (https://github.com/RUCAIBox/TextBox/blob/2.0.0/textbox/model/ptg.py#L43).
As of now, BART learning + query and keys learning are done simultaneously.

After changing this part(`self.model.requires_grad_(True)) to False, learn the query key and save the tilde p,
should I do BART fine-tuning for the same target task train set when learning again?
(set self.model.requires grad_(True) and use fixed tilde p value instead of prompt_embedding metrics)

Please let me know if anything is wrong

Answer 7 · 2023-07-21T08:03:14.000Z

You can obtain the paper's experimental results immediately by executing the following command:

python run_textbox.py --model=PTG --dataset=cnndm --model_path=facebook/bart-large

We have provided the pre-trained prompt source.

Answer 8 · 2023-07-21T08:08:39.000Z

Then, is the process of learning a PTG for a specific task a two-stage process?

(1) In the process of obtaining tilde p, query and keys are trained using the Frozen BART model. (2) Fine-tuning BART using tilde p obtained in stage (1)

If so, is only (2) executed when this command is used?

Answer 9 · 2023-07-21T08:12:43.000Z

Then, is the provided prompt source not a source prompt for the source prompt pool,
but a tilde p for 14 tasks that have already completed learning (excluding itself) for 13 tasks?

However, looking at the code, it appears that the provided prompt goes into the source task.

It was understood that the source task is used in the process of obtaining tilde p.

Please let me know if there is anything wrong with my understanding!!

Answer 10 · 2023-07-21T08:20:41.000Z

You can download it and utilize torch to load it. It contains the learned prompt for each task (i.e., 14 tensors of shape [200, 1024]).

Answer 11 · 2023-07-21T09:01:07.000Z

Taking the pc dataset as an example, the source prompts for the same target task (pc) are different in cross-task and cross-dataset experiments.

In the case of the 14 prompts provided, are the tilde p that went through the process presented in the thesis for all source tasks (13) and the tilde p used in the experiment separate?

Answer 12 · 2023-07-21T09:18:40.000Z

Sorry, I may not understand your question. Maybe you can find solution here. We have provided different options for source tasks.

Answer 13 · 2023-07-21T09:24:20.000Z

Oh if so

The source prompt is derived using the Frozen BART model (multi-key memory network not used).
Isn't tilde p obtained by utilizing "source prompts" and an adaptive attention mechanism?

Answer 14 · 2023-07-21T10:26:25.000Z

Yes, the source prompt is derived using the Frozen BART model (multi-key memory network not used). And tilde p is obtained by utilizing "source prompts" and an adaptive attention mechanism

And it is my mistake. The prompt source we provided is the P = {p1, . . . , pt, . . . , pT }.

Answer 15 · 2023-07-22T09:52:27.000Z

Then, is the process of learning a PTG for a specific task a two-stage process?

(1) In the process of obtaining tilde p, query and keys are trained using the Frozen BART model. (2) Fine-tuning BART using tilde p obtained in stage (1)

If so, I guess I need to start with (1) to train the PTG since the provided prompt source is the source prompt.

python run_textbox.py --model=PTG --dataset=cnndm --model_path=facebook/bart-large

However, using this command seems to train both BART and the prompt(query and keys) at the same time.

If so, it is impossible to obtain the paper's experimental results immediately by executing the following command in the current code, and is it correct to reproduce it through minor modifications?

python run_textbox.py --model=PTG --dataset=cnndm --model_path=facebook/bart-large (I used the command written here)

The reason for thinking so is that when learning queries and keys in PTG, self.model.requires grad_ is set to True (https://github.com/RUCAIBox/TextBox/blob/2.0.0/textbox/model/ptg.py#L43). As of now, BART learning + query and keys learning are done simultaneously.

After changing this part(`self.model.requires_grad_(True)) to False, learn the query key and save the tilde p, should I do BART fine-tuning for the same target task train set when learning again? (set self.model.requires grad_(True) and use fixed tilde p value instead of prompt_embedding metrics)

Please let me know if anything is wrong

So I asked like that, is it right to proceed with learning as I thought?

Any help would be appreciated.

Answer 16 · 2023-07-22T11:02:18.000Z

If you want to conduct the (1) step. Our provided code hasn't supported that. Maybe you should modify existing code to achieve your goal.

Answer 17 · 2023-07-22T15:31:03.000Z

Thank you for answer.

Also, in the current code, when an instance is entered, "task information" (ex: summarization) is also entered as the input of the model. (prompt + task description + input sentence)

Did you give input like this in the experiment in the actual paper? Or is the code just part of an unwritten experiment?
Did you use the same method when training the source prompt?
Why do we apply these shapes to the BART model?

Answer 18 · 2023-07-22T15:53:13.000Z

We do not use the task description in the paper. It is a default setting of TextBox. You can remove that but it has little impact on the result.
We do not use the task description during prompt training.
I may not understand this question.

Answer 19 · 2023-07-22T16:20:43.000Z

In the paper, the "Cluster" key and "Prompt" key were used.
But in the current code, the same key is passed to the MHA function. (link)

 prompt_embeds = self.lam * self.MHA(task_query, key, value) + (1 - self.lam) * self.MHA(input_query, key, value)

Using the current formula, can we experiment with PTG's second ablation study, the "PTG without prompt cluster"?
Is it correct that (1) the key learned in the same cluster unit (key learned with multiple tasks in the same cluster) and (2) the key learned with one task were used in the experiment of the actual thesis, rather than using the same key repeatedly as in the current code?

Answer 20 · 2023-08-29T04:04:23.000Z

Sorry for late response, we utilize the same key in practice.