why only mask the last channel?

Question

why only mask the last channel?

Suyuanhang opened this issue 5 years ago · 1 comments

I noticed that in create_first/other_mask functions in probclass.py, you only set the causality mask on the last one on the C or D channel, which is different to Algo 1 in the supplemental material, where causality are supposed to be set across the C/D.
In other words, why not
mask[:, K // 2, K // 2:] instead of mask[-1, K // 2, K // 2:] ?

Thank you

Answer 1 · 2019-11-26T14:14:33.000Z

I'm not sure I understand fully. But the idea is that we can use use the information of all previous channels when encoding/decoding, since we encode/decode channel by channel. Only within a channel we have to be careful, sine we do a raster scan.