How to calculate the number of trainable parameters?
rmndrs89 opened this issue · 4 comments
Hello all,
I do not intend to report a bug, but rather to ask a question as I do not understand how I can derive the number of trainable parameters for the TCN layer.
Normally, in a 1D convolutional layer, if I am not mistaking, it would simply the kernel size * number of filters + number of filters (i.e., the weights plus bias).
However, for example, for an input and model architecture like:
- batch_size, time_steps, input_dim = None, 20, 3
- nb_filters = 8
- kernel_size = 5
- dilations = (1, 2, 4)
- nb_stacks = 1
How can I derive the number trainable parameters?
Thanks in advance,
Best,
Robbin
Let me try:
- the first conv1D layer has
nb_filters*(input_dim*kernel_size + 1)
trainable parameters - add to that additional
nb_filters*(input_dim+1)
for 1x1 conv to match the input shape (channel dimension) IFnb_filters != input_dim
reference - all the rest conv1D layers have
nb_filters*(nb_filters*kernel_size + 1)
parameters - there are
2
conv1D layers per residual block, one residual block per dilation, all stackednb_stacks
times, sonb_stacks * len(dilations) * 2
conv1D layers altogether
So the number of trainable parameters N
is
N = nb_filters*(input_dim*kernel_size + 1) + # parameters for the first conv1D layer
(nb_stacks * len(dilations) * 2 - 1) * # number of layers less the first
nb_filters * (nb_filters * kernel_size + 1) # parameters per layer 2 and on
if nb_filters != input_dim:
N += nb_filters*(input_dim+1)
I have yet to go over it in detail, but I thank you for the reply. It seems to make sense to me, and I know where my reasoning went in the wrong direction.
Thanks!
@strokovnjaka; Super, I have checked it for some different settings, and it is right. My main issue was in the 1x1 (optional) conv layer, that -I think- will only be present in the first residual block in case the number of channels, input_dim
, is not equal to nb_filters
, and thereafter for the residual blocks the input_dim
will always be equal to nb_filters
.
@rmndrs89 ups, I just spotted the inconsistency.
The formula is not general, it is only valid if nb_filters
is an int
and not a list
. In the latter case, 1x1conv layers are added to residual blocks where nb_filters
change. Compare e.g. this paper on page 4