Questions about the length in the get_delta function?
teinhonglo opened this issue · 6 comments
Hi,
Thanks for releasing the code for this amazing paper.
I want to ask some questions about the length parameters of the get_delta function (this link).
-
Why the default value of the length is 50?
-
If I changed the maximum sequence length of input IDs from 32 to 256 in both training and evaluation (this link), does the default value still equal 50?
I encountered some errors in the Line 170 of the prompt_bert/model.py when I changed the default value of the maximum sequence length (32->256) and batch size (256->32). -
Why the value of the parameter in the repeated function is 128?
https://github.com/kongds/Prompt-BERT/blob/main/prompt_bert/models.py#L292-L296
Thanks in advance.
Tien-Hong
Hello, sorry for late reply.
The default maximum input length in get_delta
will not influence the final results. But this value should larger than input length to avoid out of bounds (we need to get the corresponding delta according to the length of the sentence).
If you want to change maximum sequence length to 256, you can change it like this (it makes delta cache support sentence length from 0-500, which avoids out of bounds):
diff --git a/prompt_bert/models.py b/prompt_bert/models.py
index da8058f..32c59b8 100644
--- a/prompt_bert/models.py
+++ b/prompt_bert/models.py
@@ -75,7 +75,7 @@ def cl_forward(cls,
labels=None,
return_dict=None,
):
- def get_delta(template_token, length=50):
+ def get_delta(template_token, length=500):
with torch.set_grad_enabled(not cls.model_args.mask_embedding_sentence_delta_freeze):
device = input_ids.device
d_input_ids = torch.Tensor(template_token).repeat(length, 1).to(device).long()
@@ -289,10 +289,10 @@ def sentemb_forward(
if cls.model_args.mask_embedding_sentence_delta and not cls.model_args.mask_embedding_sentence_delta_no_delta_eval :
device = input_ids.device
- d_input_ids = torch.Tensor([cls.mask_embedding_template]).repeat(128, 1).to(device).long()
- d_position_ids = torch.arange(d_input_ids.shape[1]).to(device).unsqueeze(0).repeat(128, 1).long()
+ d_input_ids = torch.Tensor([cls.mask_embedding_template]).repeat(500, 1).to(device).long()
+ d_position_ids = torch.arange(d_input_ids.shape[1]).to(device).unsqueeze(0).repeat(500, 1).long()
if not cls.model_args.mask_embedding_sentence_delta_no_position:
- d_position_ids[:, len(cls.bs)+1:] += torch.arange(128).to(device).unsqueeze(-1)
+ d_position_ids[:, len(cls.bs)+1:] += torch.arange(500).to(device).unsqueeze(-1)
m_mask = d_input_ids == cls.mask_token_id
with torch.no_grad():
The default values of 50 and 128 in cl_forward
and sentemb_forward
, respectively, represent the delta sizes for training and validating. These values were arbitrarily chosen and can be adjusted as long as they are larger than the maximum sequence length of the data.
It works in the training phase.
Thank you very much.
In the evaluation.py, should I change any code?
In addition,
Which paper should I read to understand the delta cache in the training phase?
Thanks in advance.
For evaluation.py, the default value is 256. If the maximum sequence length is less than 256, no modification is required.
Lines 59 to 60 in 093e343
For delta cache, it is a simple trick to cache deltas.
Thanks very much.
I got it.
For now, I want to prepare my own datasets like wiki1m_for_simcse.txt and nli_for_simcse.csv.
Could you give me some quick instructions?
These datasets are directly from SimCSE.
Maybe you can ask them for some instructions.
I will ask them.
Thanks.