Question about the implementation of the pixel-aware ACNet

Hi. Thanks for your excellent work! I have some questions about the pixel-aware implementation:

Adaptively-Connected-Neural-Networks/cnn/pixel-aware/resnet_model.py

Lines 89 to 113 in c399288

    
           def resnet_bottleneck(l, ch_out, stride, stride_first=False): 
        
               """ 
        
               stride_first: original resnet put stride on first conv. fb.resnet.torch put stride on second conv. 
        
               """ 
        
               shortcut = l 
        
               l = Conv2D('conv1', l, ch_out, 1, strides=stride if stride_first else 1, activation=BNReLU) 
        
               l_3x3 = Conv2D('conv2', l, ch_out, 3, strides=1 if stride_first else stride, activation=tf.identity) 
        
               shape = l_3x3.get_shape().as_list() 
        
               l_gap = GlobalAvgPooling('gap', l) 
        
               l_gap = FullyConnected('fc1', l_gap, ch_out, activation=tf.nn.relu) 
        
               l_gap = FullyConnected('fc2', l_gap, ch_out, activation=tf.identity) 
        
               l_gap = tf.reshape(l_gap, [-1, ch_out, 1, 1]) 
        
               l_gap = tf.tile(l_gap, [1, 1, shape[2], shape[3]]) 
        
               l_concat = tf.concat([l_3x3, l_gap], axis = 1) 
        
               l_concat = Conv2D('conv_c1', l_concat, ch_out, 1, strides=1, activation=tf.nn.relu) 
        
               l_concat = Conv2D('conv_c2', l_concat, ch_out, 1, strides=1, activation=tf.identity) 
        
               l_concat = tf.sigmoid(l_concat) 
        
               l = l_3x3 + l_gap * l_concat 
        
               l = BNReLU('conv2',l) 
        
               l = Conv2D('conv3', l, ch_out * 4, 1, activation=get_bn(zero_init=True)) 
        
               return l + resnet_shortcut(shortcut, ch_out * 4, stride, activation=get_bn(zero_init=False))

The paper says "For the pixel-aware connection, we let α = 0, β = 1 and only learn γ to save parameters and memory". So l_3x3 is the second item in formula (1), whose param β is fixed to 1. Am I right?
l_concat is the "γ" in the third item, which applies 1x1 conv to perform the two linear transformations in formula (3). My main question is about l_gap. Do you mean that you use GAP to perform "downsample" operation before applying the weight $\mathbf{w}$, so that the actual quantity of the parameters can be reduced to 1xC?

2. se GAP to perform "downsample" operation before applying the weight $\mathbf{w}$, so that the actual quantity of the parameters can be reduced to 1xC?

Thanks!

You are right.

Actually, GAP can be replaced with other types of downsampling (including image resizing) without losing accuracy. We employ GAP here to save memory, computational time, and PARAMETERs.

@wanggrun Thanks for your reply!

	def resnet_bottleneck(l, ch_out, stride, stride_first=False):
	"""
	stride_first: original resnet put stride on first conv. fb.resnet.torch put stride on second conv.
	"""
	shortcut = l
	l = Conv2D('conv1', l, ch_out, 1, strides=stride if stride_first else 1, activation=BNReLU)
	l_3x3 = Conv2D('conv2', l, ch_out, 3, strides=1 if stride_first else stride, activation=tf.identity)

	shape = l_3x3.get_shape().as_list()
	l_gap = GlobalAvgPooling('gap', l)
	l_gap = FullyConnected('fc1', l_gap, ch_out, activation=tf.nn.relu)
	l_gap = FullyConnected('fc2', l_gap, ch_out, activation=tf.identity)
	l_gap = tf.reshape(l_gap, [-1, ch_out, 1, 1])
	l_gap = tf.tile(l_gap, [1, 1, shape[2], shape[3]])

	l_concat = tf.concat([l_3x3, l_gap], axis = 1)
	l_concat = Conv2D('conv_c1', l_concat, ch_out, 1, strides=1, activation=tf.nn.relu)
	l_concat = Conv2D('conv_c2', l_concat, ch_out, 1, strides=1, activation=tf.identity)
	l_concat = tf.sigmoid(l_concat)

	l = l_3x3 + l_gap * l_concat
	l = BNReLU('conv2',l)
	l = Conv2D('conv3', l, ch_out * 4, 1, activation=get_bn(zero_init=True))

	return l + resnet_shortcut(shortcut, ch_out * 4, stride, activation=get_bn(zero_init=False))