Zero padding - possibly incorrect behavior?
iceboundflame opened this issue · 1 comments
Thank you for sharing the code and paper, it has been very helpful. I think I may have found a subtle issue with the padding scheme and would appreciate another opinion.
Conceptually, we'd like every sequence input before the first to be zero. But I noticed that the implementation pads every Conv1d input with zeros, not just the first one. In my opinion, this is incorrect behavior for each layer beyond the first.
Here is a diagram of the issue.
The triangles represent padded inputs. The bottom row (sequence input) is padded with 0, which is correct. However, the first layer's outputs are also padded with 0 (red triangles) before feeding to the next layer. I think we should instead pad with a constant vector, the result of convolving an all-zero receptive field. (Resulting in conv1's bias term.)
Similarly, the next layer up should be padded with a constant vector, whose value is the result of convolving a receptive field with a constant value (the padding of the previous layer).
Impact: A network with receptive field
Here's a simple test case that demonstrates that prepending a sequence of zeros to the input changes the output.
def test_tcn():
torch.manual_seed(42)
def init_weights(m):
if isinstance(m, nn.Conv1d):
if hasattr(m, 'weight_g'):
# weight_norm was applied to this layer
torch.nn.init.uniform_(m.weight_g)
torch.nn.init.uniform_(m.weight_v)
# XXX: not sure if this is correct way to initialize
else:
torch.nn.init.uniform_(m.weight)
torch.nn.init.uniform_(m.bias)
with torch.no_grad():
net = tcn.TemporalConvNet(num_inputs=1, num_channels=[2, 1], kernel_size=2, dropout=0)
net.apply(init_weights)
print("Receptive field", net.receptive_field_size)
for i in range(8):
print(f"Padding with {i} zeros:",
net(torch.Tensor([[ [0] * i + [1] ]])))
print("Zero input response:", net(torch.Tensor([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])))
Receptive field 7
Padding with 0 zeros: tensor([[[2.1018]]])
Padding with 1 zeros: tensor([[[1.3458, 2.2364]]])
Padding with 2 zeros: tensor([[[1.3458, 1.4805, 2.4149]]])
Padding with 3 zeros: tensor([[[1.3458, 1.4805, 1.6590, 2.4309]]])
Padding with 4 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 2.4466]]])
Padding with 5 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 2.4550]]])
Padding with 6 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 2.4550]]])
Padding with 7 zeros: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 2.4550]]])
Zero input response: tensor([[[1.3458, 1.4805, 1.6590, 1.6749, 1.6907, 1.6991, 1.6991, 1.6991,
1.6991, 1.6991, 1.6991, 1.6991]]])
Clearly this TCN implementation is still able to achieve great results, so I am not yet sure of the practical impact. I'll experiment with changing it for my application.
After each convolution in the TCN, we eliminate the first p
elements of all channels where p
is the number of padding elements. As a result, I don't believe this should be a problem if you use the 1D convolutions in the context of the provided TCN architecture? I.e., in one temporal residual block the layers
self.conv1 = weight_norm(nn.Conv1d(n_inputs, n_outputs, kernel_size,
stride=stride, padding=padding, dilation=dilation))
self.chomp1 = Chomp1d(padding)
self.relu1 = nn.ReLU()
self.dropout1 = nn.Dropout(dropout)
are applied in this order. So we Chomp1
before doing another Conv1d
. So, even though there are some initial "incorrect" elements at intermediate stages, we are continuously removing them and adding them back to be ultimately removed before outputting the final result.