Reproduce SE-VGG16 results
eslambakr opened this issue · 0 comments
Hello @hujie-frank
First of all, thanks for sharing your amazing work.
I am trying to reproduce your results using VGG-16 but on cifar10 & cifar100, but unfortunately, I couldn't increase the accuracy.
I run two experiments the first one is the base line in which I trained the original VGG-16 without adding the SE block and the second experiment I added the SE block, I expect the validation accuracy to increase but unfortunately it doesn't.
Here you are my training details:
1- Learning rate starts from 1e-4 and decays to 1e-5.
2- I resized the input size to be 224.
3- I used ADAM optimizer.
4- I construct the original VGG as follows
conv = Conv2D(channels, kernel_size=kernel_size, padding='same', activation='relu', use_bias=False , kernel_regularizer=regularizers.l2(0.0005), name="conv_" + str(block_number))(input)
conv = BatchNormalization()(conv)
conv = Dropout(rate=drop)(conv)
5- I construct a second version of VGG
conv = Conv2D(channels, kernel_size=kernel_size, padding='same', activation='relu', use_bias=False , kernel_regularizer=regularizers.l2(0.0005), name="conv_" + str(block_number))(input)
conv = Dropout(rate=drop)(conv)
conv = BatchNormalization()(conv)
6- I construct the SE-VGG as follows
conv = Conv2D(channels, kernel_size=kernel_size, padding='same', activation='relu', use_bias=False , kernel_regularizer=regularizers.l2(0.0005), name="conv_" + str(block_number))(input)
conv = Dropout(rate=drop)(conv)
conv = SE_Layer(name=str(block_number), input_layer=conv_layer)(conv)
conv = BatchNormalization()(conv)
and here you are my implementation for the SE_Layer:
class SE_Layer(Layer):
def __init__(self, input_layer, **kwargs):
#self.scaling = scaling
self.input_layer = input_layer
self.prev_layer = None
self.ratio = 8
self.x = None
super(SE_Layer, self).__init__(**kwargs)
def build(self, input_shape):
self.dense_1_weights = self.add_weight(name='dense_1_weights',
shape=(self.input_layer.output_shape[-1], int(self.input_layer.output_shape[-1]/self.ratio)),
initializer='he_normal',
trainable=True)
self.dense_2_weights = self.add_weight(name='dense_2_weights',
shape=(int(self.input_layer.output_shape[-1]/self.ratio), self.input_layer.output_shape[-1]),
initializer='he_normal',
trainable=True)
super(SE_Layer, self).build(input_shape)
def call(self, conv):
c = int(conv.shape[-1])
x = conv
x = GlobalAveragePooling2D(data_format='channels_last')(x)
x = K.mean(x, axis=[0], keepdims=True)
x = _normalize(x)
x = Reshape([1, 1, c], name=self.name + "_reshape")(x)
x = tf.matmul(x, self.dense_1_weights)
x = relu(x)
x = tf.matmul(x, self.dense_2_weights)
x = sigmoid(x)
self.x = x
y = multiply([conv, x], name=self.name + "_mul")
return y
def get_scaling(self):
return self.x
def compute_output_shape(self, input_shape):
return input_shape`
Here you are my results, the 3 experiments (changing the architecture as described in steps 4,5 and 6) achieved the same accuracy (94.1 %)
The training stops when the model is over-fitting as I made a patient for 20 epoch to guarantee the model is over-fitting.
Thanks in advance, I hope you could help me to reproduce your results.