Span corruption in data preprocessing not worked as expected

The returned mask is consisted of: non-noise + noise + non-noise + noise + ... + non-noise + noise.
This means the final tokens must be noise and the first token must be non-noise.
It won't have negative effects in most long text situations. However when input length is short, for example, 20 tokens, the output only consisted of one noise span in the last few tokens.

text-to-text-transfer-transformer/t5/data/preprocessors.py

Line 2699 in f977126

def random_spans_noise_mask(length,

I think the mask should better be: non-noise(allow empty) + noise + ... + noise + non-noise(allow empty). In this way, the returned mask will be randomly distributed in the whole text.

@adarob @t5-copybara @craffel @cghawthorne