faster rcnn의 anchor generator 개념 뿐만 아니라 소스레벨에서도 이해하기

Question

faster rcnn의 anchor generator 개념 뿐만 아니라 소스레벨에서도 이해하기

chullhwan-song opened this issue 5 years ago · 1 comments

Answer 1 · 2019-08-28T04:59:11.000Z

anchor의 이해

base는 요런 그림이다.
즉, 자세히 보면, grid중심에 작은 사각형 중심으로 그린다면, 총 3개의 점선 사각형이 존재한다.
이때, 새로로 큰 하나가 aspect ratio - 0.5, 정사각형 aspect ratio - 1, 가로로 긴사각형이 aspect ratio - 2
이를 그림으로 다시 보면,
첫번째 그림에서보면 녹색과 레드는 scale만 다른것이다.(여기서 aspect ratio로랑 헷갈리지 말자.)
하나를 때어내어, 두번째 사진을 보면, 한 cell당 3개의 aspect ratio로 기준으로 3개의 rectangle이 나온다.
- aspect ratio = [0.5, 1, 2]
- base rectangle = [0, 0, 256, 256] > [ y center, x center, h, w]
  - 요런 기존 사각형을 하나주면 * 3개의 aspect ratios가 존재하므로 최종 base anchor크기는 3개.
그래서, 위의 참조소스는, tf와 np로 짠소스가 있다. 이를 참조하면, 다음 두 함수(tf)로 구성

def enum_scales(base_anchor, anchor_scales, name='enum_scales'):

    '''
    :param base_anchor: [y_center, x_center, h, w]
    :param anchor_scales: different scales, like [0.5, 1., 2.0]
    :return: return base anchors in different scales.
            Example:[[0, 0, 128, 128],[0, 0, 256, 256],[0, 0, 512, 512]]
    '''
    with tf.variable_scope(name):
        anchor_scales = tf.reshape(anchor_scales, [-1, 1])

        return base_anchor * anchor_scales


def enum_ratios(anchors, anchor_ratios, name='enum_ratios'):

    '''
    :param anchors: base anchors in different scales
    :param anchor_ratios:  ratio = h / w
    :return: base anchors in different scales and ratios
    '''

    with tf.variable_scope(name):
        _, _, hs, ws = tf.unstack(anchors, axis=1)  # for base anchor, w == h
        sqrt_ratios = tf.sqrt(anchor_ratios)
        sqrt_ratios = tf.expand_dims(sqrt_ratios, axis=1)
        ws = tf.reshape(ws / sqrt_ratios, [-1])
        hs = tf.reshape(hs * sqrt_ratios, [-1])
        # assert tf.shape(ws) == tf.shape(hs), 'h shape is not equal w shape'

        num_anchors_per_location = tf.shape(ws)[0]

        return tf.transpose(tf.stack([tf.zeros([num_anchors_per_location, ]),
                                      tf.zeros([num_anchors_per_location,]),
                                      ws, hs]))


if __name__ == '__main__':

    base_anchor  = tf.constant([0, 0, 256, 256], dtype=tf.float32) #[center_y, cetner_x, h, w]
    anchor_scales = tf.constant([1.0], dtype=tf.float32)
    anchor_ratios = tf.constant([0.5, 1.0, 2.0], dtype=tf.float32)

    anchor_scales_ratios = enum_ratios(enum_scales(base_anchor, anchor_scales), anchor_ratios)
    # print(anchor_scales_ratios.shape) # (3, 4) > 3개의 aspect ratio당 > rectangle 위치 [center_y, cetner_x, h, w]

    sess = tf.Session()
    print (sess.run(anchor_scales_ratios))
    # [center_y, cetner_x, h, w] 이므로, 중심(x, y) = (0, 0), (w, h) = (362.03867, 181.01933)
    # [[  0.        0.      362.03867 181.01933] # 0.5
    # [  0.        0.      256.      256.     ]  # 1.0
    # [  0.        0.      181.01933 362.03867]] # 2.0 <aspect ratio

따라서, 요런 base anchor를 기반으로 크기와 위치를 strip 크기큼 옮겨하면서..생성하여 faster rcnn을 진행하면 되지 않을까? 이걸 이제부터 진행해보자.
이미 위에서의 그림과 거의 같다.
input으로 들어오는 이미지 또는 feature map 기준으로, grid 를 구성한다. 한 cell은 위에서 말한 미리 계산된 anchor를 mapping한다.
이때, 실제 최종적인 base anchor rectangle는 각각 3개씩 aspect ratio, scale로 구성되므로, 총 9개가 나온다. 위의 그림이 바로 그러한 예이다.
그래서,
- grid를 구성해야하고, grid당 aspect ratio 기준으로 3개씩 구성하고, 이를 기반으로 다시 3 scale기준으로 생성하므로 총 9개의 anchor가 구성.
  - 위의(3번째) 그림에서 언급한 sliding window 개념으로 하나씩 접근한다고 생각하면> grid개(안의 cell 개수)*9 > 구성되어야한다.
소스는
feature map 기준으로 한다.
- width = 50, height = 38
gid를 생성하자 > grid안의 한 cell의 좌표당 > scale*aspect_ratio 개수 만큼의 anchor(위의 그림들)을 갖게 됨
- feature map기준으로 하니, 총 50*38개의 좌표가 나온데. 밑에 처럼 stride = 1인 경우먼저 만들고 실제 입려값 16을 곱해서 만든다.
  - 조금해야할것은, 16이었다고, 503816이 아니라, 50*38개이다. 단 간격이 stride =16일뿐이다.
  - 그래서 max_width = 800, max_height = 608 크기로 scale해졌다.
stride = 1인 grid 생성 - tf.meshgrid > 링크
여기서 중요한것은 grid 위치가 window sliding 순서로 생성된다는것.
stride = 16인형태로 전환 *stride=16해주면된다.
feature map= 38x16xscalexaspect_ratio의 anchor box 위치(사각형) 형태가 나온다.(최종결과)
- [38x16xscalexaspect_ratio, 4] < [min_y, min_x, max_y, max_x]

# 위의 소스 이어서

def make_anchors(base_anchor_size, anchor_scales, anchor_ratios, featuremaps_height,
                 featuremaps_width, stride, name='make_anchors'):

    '''
    :param base_anchor_size: base anchor size in different scales
    :param anchor_scales: anchor scales
    :param anchor_ratios: anchor ratios
    :param featuremaps_width: width of featuremaps
    :param featuremaps_height: height of featuremaps
    :return: anchors of shape [w * h * len(anchor_scales) * len(anchor_ratios), 4]
    '''

    # 여기서 잊으면 안되는게 anchor에 대한 scale 1하나분인 예제임 > 보통은 3개로 설명되어 있음..
    with tf.variable_scope(name):
        # [y_center, x_center, h, w]
        base_anchor = tf.constant([0, 0, base_anchor_size, base_anchor_size], dtype=tf.float32)
        base_anchors = enum_ratios(enum_scales(base_anchor, anchor_scales), anchor_ratios) # (3, 4) > 3개의 aspect ratio당 > rectangle 위치 [center_y, cetner_x, h, w]

        _, _, ws, hs = tf.unstack(base_anchors, axis=1) # w, h 분리, center는 모두 0


        # feature map widht가 50x16=800 이므로, 0부터 stride크기 간격으로..점점..
        #[0,16,32,,,,,784] > 50개
        #      > tf.range > [0,1,2,,,,49] > 여기다가 stride=16를 곱하니 > [0, 16,,784]
        x_centers = tf.range(tf.cast(featuremaps_width, tf.float32), dtype=tf.float32)* stride # 50x16=800 > 800개 크기 안에 0부터 16dim커지면서, 50개의 point가 구성된다는 의미.
        # feature map height가 38*16=608 이므로, 0부터 stride크기 간격으로..점점..
        #[0,16,32,,,,,592] > 38개
        y_centers = tf.range(tf.cast(featuremaps_height, tf.float32), dtype=tf.float32) * stride #38*16=608

        x_centers2 = x_centers #(50,)
        y_centers2 = y_centers #(38,)

        # X, Y = meshgrid(x, y) > x(50), y(38) > y.dim, x.dm) > (38, 50)
        # grid 좌표들 발생
        #       i) y축은 그대로 있고, x축만 옮기는 순으로.
        #       ii) y축이 한칸? 변하고, 다시 그 y축 은 그대로 있고 x축만 옮기는 순
        #       iii) 즉, slindig window 방향으로 움직인다고 생각하면, 그 방향대로 (x,y)좌표들이 발생.
        # > grid의 x 값 = x_centers, y값 grid y값.
        # > 이때의 좌표는 center를 의미한다 > 헷갈림 사각형의 시작점이 아닌가???
        x_centers, y_centers = tf.meshgrid(x_centers, y_centers) # (38, 50)

        x_centers3 = x_centers # (38, 50) = 1900
        y_centers3 = y_centers # (38, 50)

        # base_anchors는 aspect ratip에 의해  3개이므로, 3개씩, 각, w, h별 분리하여 grid 생성.
        # y_centers > [0,16,32,,] 값이 순서대로 각 50개씩 같은수(> 첫번째는 [0, 0, 0]이 50번 나온다. )가 들어가 있음.
        # x_centers > [0,16,32, 64,,,784] > 위와 다르게 [0, 0, 0][16, 16, 16]....[784, 784, 784] 이패턴이 반복.
        # (1900, 3) > 50x38=1900

        ws1 = ws # [362.03867 256.      181.01933] # base anchor(3개의 aspect ratio에 대한) width(x) 정보
        hs1 = hs # [181.01933 256.      362.03867] # base anchor(3개의 aspect ratio에 대한) height(y) 정보

        # x 만 저장된다. > grid에 매핑되는 x의 모든값(= 1900) 에다가 한점은 asptio ratio(3) 크기만큼존재.
        # > 즉, 일차원 배열 형태로( (38, 50) > (1900)) x의 모든것을 저장
        ws, x_centers = tf.meshgrid(ws, x_centers) # x:ws(3), y:x_centers(38x50=1900) = (1900, 3)
        # y 만 저장되는데, x와 쌍으로 (x,y)...
        hs, y_centers = tf.meshgrid(hs, y_centers) # x:ws(3), y:y_centers(38x50=1900) = (1900, 3)

        box_centers = tf.stack([y_centers, x_centers], axis=2)
        box_centers = tf.reshape(box_centers, [-1, 2])
        
        # hs, ws > (1900, 3)
        box_sizes = tf.stack([hs, ws], axis=2) # (1900, 3, 2)
        box_sizes1 = box_sizes
        box_sizes = tf.reshape(box_sizes, [-1, 2]) # (5700, 2)

        # box_centers의 기준은 상단 오른쪽(두번째 점)인듯..x 은 width 반을 빼줘 왼쪽으로, y은 height바을 더해줘 아래로, 그래서 중심.
        # 이래 생각했는데, 밑에는 반대로 하고 있음.> 그래서, 좌표중심이 하단왼쪽인듯..그래서 min을 구하기 위해 center에서 빼주고, max를 구하기 위해 더해줌.
        # https://stackoverflow.com/questions/36013063/what-is-the-purpose-of-meshgrid-in-python-numpy
        # > [y_min, x_min, y_max, x_max]
        final_anchors = tf.concat([box_centers - 0.5*box_sizes, box_centers+0.5*box_sizes], axis=1) # [center_y, cetner_x, h, w]
        return final_anchors, y_centers3, x_centers3, hs, ws, box_sizes1

if __name__ == '__main__':
    #base_anchor   = tf.constant([256], dtype=tf.float32) #[center_y, cetner_x, h, w]
    base_anchor  = tf.constant([0, 0, 256, 256], dtype=tf.float32) #[center_y, cetner_x, h, w]

    anchor_scales = tf.constant([1.0], dtype=tf.float32)
    anchor_ratios = tf.constant([0.5, 1.0, 2.0], dtype=tf.float32)
    #anchor_ratios = tf.constant([1.0], dtype=tf.float32)

    anchor_scales_ratios = enum_ratios(enum_scales(base_anchor, anchor_scales), anchor_ratios)
    # print(anchor_scales_ratios.shape) # (3, 4) > 3개의 aspect ratio당 > rectangle 위치 [center_y, cetner_x, h, w]
    sess = tf.Session()
    print (sess.run(anchor_scales_ratios))
    # [center_y, cetner_x, h, w] 이므로, 중심(x, y) = (0, 0), (w, h) = (362.03867, 181.01933)
    # [[  0.        0.      362.03867 181.01933] # 0.5
    # [  0.        0.      256.      256.     ]  # 1.0
    # [  0.        0.      181.01933 362.03867]] # 2.0 scale
    print ('------------------->>')
    anchors, y_centers, x_centers, hs, ws, box_sizes = make_anchors(256, anchor_scales, anchor_ratios, featuremaps_height=38, featuremaps_width=50, stride=16)
    _anchors = sess.run(anchors)
    gy_centers = sess.run(y_centers)
    gx_centers = sess.run(x_centers)
    ghs =  sess.run(hs)
    gws =  sess.run(ws)
    b_size =  sess.run(box_sizes)

다음 기회가 된다면, next level 실제 이후 어떻게 쓰이는지를..