padding_idx=masking_idx in ByteNetLMTime instantiation arguments
cmarkak opened this issue · 1 comments
The following code in train.py
assigns padding_idx=masking_idx
in the model initiation. This is conflicting with the definition above for padding_idx
which is different from masking_idx
. Is this an oversight or there is a particular reason for this assignment?
padding_idx = tokenizer.pad_id # PROTEIN_ALPHABET.index(PAD)
masking_idx = tokenizer.mask_id
print('Using {} as padding index'.format(padding_idx))
print('Using {} as masking index'.format(masking_idx))
#if args.model_type == 'ByteNet':
model = ByteNetLMTime(n_tokens, d_embed, d_model, n_layers, kernel_size, r,
causal=causal, padding_idx=masking_idx, rank=weight_rank, dropout=args.dropout,
tie_weights=args.tie_weights, final_ln=args.final_norm, slim=slim, activation=activation,
timesteps=diffusion_timesteps)
Thank you in advance
This is done on purpose. We follow how ESM handles mask tokens. Padding is handled with input_mask