gzuidhof/nn-transfer

Batchnorm from pytorch to keras

dathath opened this issue · 7 comments

I'm having issues in transferring models with batchnorm layers from pytorch to keras. Other way round works perfectly fine. Any thoughts? Appreciate the help!

Here are the two architectures I am testing:
Keras Model:
model = Sequential()
model.add(Conv2D(6, kernel_size=(5, 5),
activation='relu',
input_shape=(1, 28, 28),
name='conv1'))
model.add(BatchNormalization(axis=1,name='bnm1',momentum=0.1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(16, (5, 5), activation='relu', name='conv2'))
model.add(BatchNormalization(axis=1,name='bnm2',momentum=0.1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(120, activation='relu', name='fc1'))
model.add(BatchNormalization(axis=1,name='bnm3',momentum=0.1))
model.add(Dense(84, activation='relu', name='fc2'))
model.add(BatchNormalization(axis=1,name='bnm4',momentum=0.1))
model.add(Dense(10, activation=None, name='fc3'))
model.add(Activation('softmax'))
model.compile(
loss=cross_entropy,
optimizer='adadelta',
metrics=['accuracy']
)

Pytorch Model:
class LeNet(nn.Module):
def init(self):
super(LeNet, self).init()
self.conv1 = nn.Conv2d(1, 6, 5)
self.bnm1 = nn.BatchNorm2d(6, momentum=0.1)
self.conv2 = nn.Conv2d(6, 16, 5)
self.bnm2 = nn.BatchNorm2d(16, momentum=0.1)
self.fc1 = nn.Linear(256, 120)
self.bnm3 = nn.BatchNorm1d(120, momentum=0.1)
self.fc2 = nn.Linear(120, 84)
self.bnm4 = nn.BatchNorm1d(84, momentum=0.1)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
    out = F.relu(self.conv1(x))
    #out = self.bnm1(out)
    out = F.max_pool2d(out, 2)
    out = F.relu(self.conv2(out))
    #out = self.bnm2(out)
    out = F.max_pool2d(out, 2)
    out = out.view(out.size(0), -1)
    out = F.relu(self.fc1(out))
    #out = self.bnm3(out)
    out = F.relu(self.fc2(out))
    #out = self.bnm4(out)
    out = self.fc3(out)
    return (out)

Found the fix! Sorry

Good to hear :) Maybe you could post the fix or mistake in here for future reference?

Oops, actually it isn't fixed. The outputs are still inconsistent when I load a saved torch model first and then do the transfer. Otherwise it is consistent... strange. But, also, the accuracies are consistent on the transferred model to the loaded torch model, just the actual outputs are very different.

  • Your Keras model has a softmax at the end, your pytorch model doesn't (although maybe you are doing it outside of this definition)
  • Your batchnorm layers are commented out, is that intentional?

Maybe one of these two reasons?

Protip: You can put ``` on the line before and after your code, and it will format nicely.

Oops, sorry. It wasn't commented out and I do add the softmax outside, just copy pasted a snipped here -- I later commented out the batchnorms before copy pasting because I was using the models without the batch-norms.....

The issue is I added a unit test (similar to the one in the example...) and I set assert (np.linalg.norm of difference between predictions <=10**-3) but that test fails if we have batchnorms and not otherwise. But, the accuracies on the data-sets match very closely. Not sure if it's just the way batchnorm works in pytorch and tensorflow.

Quote from post above by dathath

but that test fails if we have batchnorms and not otherwise

Look at the definition of keras batch normalization function:
keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, ...)
and torch definition:
torch.nn.BatchNorm2d(num_features, eps=1e-05,...)
I guess the reason that your test becomes to fail could be that epsilons are different. (Actually this epsilon mismatching was the only thing I fixed in torch batchnorm. Then the results became the same in both frameworks).

Quote from post above by dathath

but that test fails if we have batchnorms and not otherwise

Look at the definition of keras batch normalization function:
keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, ...)
and torch definition:
torch.nn.BatchNorm2d(num_features, eps=1e-05,...)
I guess the reason that your test becomes to fail could be that epsilons are different. (Actually this epsilon mismatching was the only thing I fixed in torch batchnorm. Then the results became the same in both frameworks).

Did you also change the momentum value as it is different in pytorch and keras?