philipperemy/keras-tcn

How to calculate the number of trainable parameters?

rmndrs89 opened this issue · 4 comments

Hello all,

I do not intend to report a bug, but rather to ask a question as I do not understand how I can derive the number of trainable parameters for the TCN layer.
Normally, in a 1D convolutional layer, if I am not mistaking, it would simply the kernel size * number of filters + number of filters (i.e., the weights plus bias).
However, for example, for an input and model architecture like:

  • batch_size, time_steps, input_dim = None, 20, 3
  • nb_filters = 8
  • kernel_size = 5
  • dilations = (1, 2, 4)
  • nb_stacks = 1

How can I derive the number trainable parameters?

Thanks in advance,
Best,
Robbin

Let me try:

  • the first conv1D layer has nb_filters*(input_dim*kernel_size + 1) trainable parameters
  • add to that additional nb_filters*(input_dim+1) for 1x1 conv to match the input shape (channel dimension) IF nb_filters != input_dim reference
  • all the rest conv1D layers have nb_filters*(nb_filters*kernel_size + 1) parameters
  • there are 2 conv1D layers per residual block, one residual block per dilation, all stacked nb_stacks times, so nb_stacks * len(dilations) * 2 conv1D layers altogether

So the number of trainable parameters N is

N = nb_filters*(input_dim*kernel_size + 1)  +          # parameters for the first conv1D layer
    (nb_stacks * len(dilations) * 2 - 1) *             # number of layers less the first
     nb_filters * (nb_filters * kernel_size + 1)       # parameters per layer 2 and on

if nb_filters != input_dim:
     N += nb_filters*(input_dim+1)

I have yet to go over it in detail, but I thank you for the reply. It seems to make sense to me, and I know where my reasoning went in the wrong direction.

Thanks!

@strokovnjaka; Super, I have checked it for some different settings, and it is right. My main issue was in the 1x1 (optional) conv layer, that -I think- will only be present in the first residual block in case the number of channels, input_dim, is not equal to nb_filters, and thereafter for the residual blocks the input_dim will always be equal to nb_filters.

@rmndrs89 ups, I just spotted the inconsistency.

The formula is not general, it is only valid if nb_filters is an int and not a list. In the latter case, 1x1conv layers are added to residual blocks where nb_filters change. Compare e.g. this paper on page 4