dapowan/LIMU-BERT-Public

func_loss issue

Opened this issue · 2 comments

Hi,

I have two doubts regarding the code.

  1. Is the mask for the original sequence static during pre-train?
    E.g.: I have a sequence of length 10, [1, 2, 3, 4, 5, 6, 7, 8, 9, 0],
    A mask [0, 1, 1, 1, 0, 0, 0, 0, 0, 0] is generated using the algorithm you have provided, will the mask vary during the training? Probably for the purpose of enabling the model to learn more information?

  2. The loss function defined in pretrain.py does not seem to only consider the difference between masked elements in the sequence, it seems to consider the loss as the difference between the predicted sequence and the original sequence. Which I find contradicting to what had been written in the paper.

   def func_loss(model, batch):
       mask_seqs, masked_pos, seqs = batch #

       seq_recon = model(mask_seqs, masked_pos) #
       loss_lm = criterion(seq_recon, seqs) # for masked LM
       return loss_lm

Hi, thanks for your interest. Sorry for the late responses:

  1. Yes, it varies during the training. We set the random seed in utils.py line 444 'set_seeds(train_cfg.seed)' to maintain reproducibility. Every time when you run the script and call the mask function for the first time, it would give the same results. But different masked results can be obtained for the following callings.

  2. You can check the details from model.py line 158 LIMUBertModel4Pretrain model. It only outputs the masked sequence specifying by the masked_pos.

For the above two questions, I highly recommend you debug our codes to check how they work in detail if you want to fully grasp it.

Hi,

Thank you very much for the reply! I find it very useful! However, another doubt has been raised.

Considering the issue of the masking method, I have another doubt which is related to class Preprocess4Mask, line 276 in the utils.py file.

I have figured that the number of masked positions (more precisely, the number of positions that are set to 0 in instance_mask) is different from the number of entries of seq (more precisely, this is the ground truth of positions to be predicted).

This is not a big issue, but it would cause a slight problem when calculating the loss.

Let me give you an example:
In instance_mask, there are 814 positions being set to 0, which means that there are 814 positions' values that requires to be predicted.
However, seq has 1012 entries, saying that 1012 positions are picked out for prediction and for calculating the loss.
This leads to the issue: 1012 - 814 = 198 positions are not masked, but still picked for calculating loss.

I do not know if this is done on purpose or not.

As for the masking configuration, I use exactly the same configuration in the JSON file, also shown below

  "limu_mask": {
    "mask_ratio": 0.15,
    "mask_alpha": 6,
    "max_gram": 10,
    "mask_prob": 0.8,
    "replace_prob": 0.0
  },

I did find the cause of this issue which is if np.random.rand() < self.mask_prob:
By removing this line, not considering the mask_prob, this issue is solved.

It would wonderful if you could explain to me a bit, why is mask_prob needed, and also whether you have done this on purpose.