andravin/wincnn

Large difference to regular convolution

Closed this issue · 9 comments

Hi there,

I am currently trying to implement winograd convolution in python. I wrote a small test program to check if the results I get are correct, but there are large deviations:

import numpy as np
from scipy import signal

testImg = np.matrix([[  1,  2,  3,  4], 
                     [  5,  6,  7,  8],
                     [  9, 10, 11, 12],
                     [ 13, 14, 15, 16]]).astype(np.float)

testFilter = np.matrix([[  2,  1,  1],
                        [  1,  2,  1],
                        [  1,  1,  1]]).astype(np.float)

result = signal.convolve2d(testImg, testFilter, mode='valid')

print "Result (regular):\n", result


BT = np.matrix([[1,  0, -1, 0],
                [0,  1,  1, 0],
                [0, -1,  1, 0],
                [0, -1,  0, 1]]).astype(np.float)

G = np.matrix([[1  ,  0  , 0  ],
               [0.5,  0.5, 0.5],
               [0.5, -0.5, 0.5],
               [0  ,  0  , 1  ]]).astype(np.float)

AT = np.matrix([[1, 1,  1, 0],
                [0, 1, -1, 1]]).astype(np.float)

B = np.transpose(BT)
GT = np.transpose(G)
A = np.transpose(AT)

U = np.dot(np.dot(G, testFilter), GT)
V = np.dot(np.dot(BT, testImg), B)
combined = np.multiply(U, V)

result = np.dot(np.dot(AT, combined), A)

print "Result (winograd):\n", result

gives:

Result (regular):
[[  71.   82.]
 [ 115.  126.]]
Result (winograd):
[[  61.   72.]
 [ 105.  116.]]

Is this normal?

I don't have time to debug your code at this moment, but I did want to point out there are numpy implementations in the neon code base:

https://github.com/NervanaSystems/neon/blob/master/neon/backends/winograd.py
https://github.com/NervanaSystems/neon/blob/master/neon/backends/winograd4.py

Thank you very much. The intention with my implementation was to get the basics down first. The Nervana implementation already does things like creating the image tiles, multiple channels and putting everything into one big gemm. I found another implementation though and that one had the same issue as mine, so I thought it might have been an algebraic or numeric error.

I will still have a look at that implementation though. Maybe I can track this down. Thanks again.

@Pfaeff Just from calculating the first convolution output in my head, I can see that you flipped the filter in x and y when you computed "regular" convolution. You probably did not do that on purpose, but used a python function that does it automatically, as that is the classical definition of linear convolution.

Good catch. Try using scipy.signal.correlate2d instead. Convolution in deep learning is actually Cross-correlation from traditional math/signal processing. Don't know the history of why it ended up that way.

@Pfaeff It is kind of fun actually, because of the filter and data you chose. The filter is almost symmetric, except for the 2 at the top left that gets swapped with the 1 in the bottom right when you flip. Also If you take any of the the data elements from the top-left 2x2 square, and then subtract it from the element that is 2 down and 2 over, you get 10. So the difference between the flipped and unflipped convolution will be 10*(2-1) = 10 for each output!

Yeah, after flipping the kernel both ways it seems to work. For some reason I thought that the winograd convolution was just like a regular convolution (including the flipping) and not the variant used in CNNs that omits it. Thank you very much.

Yes, this is a common point of confusion.

The wincnn.showCookToomFilter output shows exactly what is computed if you are unsure:

>>> wincnn.showCookToomFilter((0,1,-1), 2, 3)
AT*((G*g)(BT*d)) =
⎡d[0]⋅g[0] + d[1]⋅g[1] + d[2]⋅g[2]⎤
⎢                                 ⎥
⎣d[1]⋅g[0] + d[2]⋅g[1] + d[3]⋅g[2]⎦

But you are also right. Winograd/Cook/Toom naturally produces a linear convolution algorithm, which also yields more outputs than inputs incidentally, and we transposed and exchanged the data and output matrices to yield an FIR filter algorithm, as Winograd refers to it.

This technique is referred to as the "Transformation principal".

In case you're curious here is a linear implementation of F(2x2,5x5)

https://github.com/NervanaSystems/neon/blob/master/neon/backends/winograd5.py

This turned out to be a mostly failed experiment on the GPU since loading overlapping slices through cache is far more efficient than trying to store overlapping slices. Atomic adds just don't have a high enough throughput and the output transform is pretty hard to amortize without a very large reduction dimension.

@Pfaeff I am about to add a showCookToomConvolution function to complement the showCookToomFilter function. Hopefully that will make the relationship between convolution and filtering algorithms more clear.

Here is example output:

>>> wincnn.showCookToomConvolution((0,1,-1),2,3)
A = 
⎡1  0 ⎤
⎢     ⎥
⎢1  1 ⎥
⎢     ⎥
⎢1  -1⎥
⎢     ⎥
⎣0  1 ⎦

G = 
⎡ 1    0     0 ⎤
⎢              ⎥
⎢1/2  1/2   1/2⎥
⎢              ⎥
⎢1/2  -1/2  1/2⎥
⎢              ⎥
⎣ 0    0     1 ⎦

B = 
⎡1   0  0   0 ⎤
⎢             ⎥
⎢0   1  -1  -1⎥
⎢             ⎥
⎢-1  1  1   0 ⎥
⎢             ⎥
⎣0   0  0   1 ⎦

Linear Convolution: B*((G*g)(A*d)) =
⎡      d[0]⋅g[0]      ⎤
⎢                     ⎥
⎢d[0]⋅g[1] + d[1]⋅g[0]⎥
⎢                     ⎥
⎢d[0]⋅g[2] + d[1]⋅g[1]⎥
⎢                     ⎥
⎣      d[1]⋅g[2]      ⎦