questions about the padding value

Fengshenbang-LM/fengshen/examples/qa_t5/qa_dataset.py

Lines 97 to 98 in c8fb7b8

    
           pad_token_id: -100 
        
           decoder_start_token_id: 0

Fengshenbang-LM/fengshen/examples/qa_t5/qa_dataset.py

Lines 115 to 121 in c8fb7b8

    
           for k, v in batch.items(): 
        
               if k != "labels" and k != "idx": 
        
                   batch[k] = pad_sequence( 
        
                       v, batch_first=True, padding_value=self.pad_token_id 
        
                   ) 
        
               elif k == "labels": 
        
                   batch[k] = pad_sequence(v, batch_first=True, padding_value=-100)

First of all, thank you for your code. Your code helps me a lot.

I have a small question on how you pad the input sequences. In Lines 97-98, you set the pad token id -100. usually, setting the token label to -100 means its loss should be ignored. I do not see why you set the padding value of input_ids and attention_mask [line 115 - 121] to -100. Are these lines wrong and I should change pad value into 0 ?

	for k, v in batch.items():
	if k != "labels" and k != "idx":
	batch[k] = pad_sequence(
	v, batch_first=True, padding_value=self.pad_token_id
	)
	elif k == "labels":
	batch[k] = pad_sequence(v, batch_first=True, padding_value=-100)