Possible wasted computations when not returning sequence

Question

Possible wasted computations when not returning sequence

davidxujiayang opened this issue 5 years ago · 3 comments

Hi, thanks for offering this great package. I'm trying to build a autoencoder using TCN. In my encoder part, the TCN will not return a sequence. After looking at your code, it seems to me that if return sequence is set to False, then only one slice in the last layer is kept for output. However, the convolution is still conducted for other slices. Tracing back the dilated convolution, many computations that only affect the discarded slices is wasted. I'm not sure if I understand it correctly. If not, please kindly point out.

I also have a question in building the decoder. I'm not sure if I should use a identical dilation order in the TCN group of layers as in the encoder, or should reverse the dilation order. This is a general question not very related to the implementation of TCN itself so you can ignore it if you'd like to. Any comment/suggestions would be appreciated.

Answer 1 · 2019-08-28T06:30:42.000Z

@davidxujiayang

Hi, thanks for offering this great package. I'm trying to build a autoencoder using TCN. In my encoder part, the TCN will not return a sequence. After looking at your code, it seems to me that if return sequence is set to False, then only one slice in the last layer is kept for output. However, the convolution is still conducted for other slices. Tracing back the dilated convolution, many computations that only affect the discarded slices is wasted. I'm not sure if I understand it correctly. If not, please kindly point out.

Yes you understood correctly. Almost all the computations will be lost. It's the same case for RNN models. But I agree that those computations could be avoided for this case.

I also have a question in building the decoder. I'm not sure if I should use a identical dilation order in the TCN group of layers as in the encoder, or should reverse the dilation order. This is a general question not very related to the implementation of TCN itself so you can ignore it if you'd like to. Any comment/suggestions would be appreciated.

That's more of a research question. In this case, I would say deconvolution are probably better than convolutional layers since you want to upsample. So TCN might not be the best here. We should build a temporal de-convolutional layer network :)

Answer 2 · 2019-10-09T07:55:53.000Z

Hi, thanks for offering this great package. I'm trying to build a autoencoder using TCN. In my encoder part, the TCN will not return a sequence. After looking at your code, it seems to me that if return sequence is set to False, then only one slice in the last layer is kept for output. However, the convolution is still conducted for other slices. Tracing back the dilated convolution, many computations that only affect the discarded slices is wasted. I'm not sure if I understand it correctly. If not, please kindly point out.

I think length of the output sequence don't need to be the same with input in TCN. stride, dilations, kernel_size should be designed if a single output is desired.
In my practice, for an encoder of Seq2Seq model, I follow the architecture in TCN paper with stride=2, kernel_size=3, dilations = 1, 2, 4 . There is only one final output when input is a sequence with 12 items. Just like this.

But it is difficult to make multi-layers in residual blocks.

Answer 3 · 2021-02-16T06:44:41.000Z

I'll close this issue since it's nontrivial to have that.