Implementation of a Masking Stage with Random Masking Options

Problem

The absence of a dedicated masking stage in our pipeline limits our ability to handle incomplete or noisy data effectively during model training.

Introduce a masking stage designed to randomly mask a specified percentage of the data or subsequences within the data:

Position: Place the masking stage after the input encoder and before the sequence model.
Functionality:
- Support random masking, either a random percentage of the tokens are masked or a randomly sampled continuous subsequence is masked.
- We should add to the batch a key indicating the labels that will be used by the Model stage to compute masked imputation loss.
Configurability: Allow users to set the percentage of data to mask.

The token loss will be the same as for forecasting:

Line 138 in 52ed2fb

def get_loss(

Lemme know what you think @teyaberg