Large Scale Landmark Recognition via Deep Metric Learning

Question

Large Scale Landmark Recognition via Deep Metric Learning

Opened this issue 5 years ago · 1 comments

chullhwan-song commented 5 years ago

https://arxiv.org/abs/1908.10192

Answer 1 · 2019-09-02T08:07:33.000Z

abstract

landmark 인식
Mail.ru(러시아)회사에서 공개한 paper
metric learning 기반..
- paper안에서는 "Curriculum learning(이미 이전 리뷰에서..#59 개념참고)" 개념과 같이
실제 회사에서 낸 케이스라 실용적인 논문일듯.

전체 프로세스

특이(?)한점은 첫번째 과정에서, landmark vs non-landmark인것인가를 check
- 이를 위해 , 각 클래스의 centroid를 계산 - 각 embedding feature의 average
  - 특이할점은, 하나의 클래스(같은 랜드마크)라도 촬영시점이 다른 것끼리는 분리
base network > Wide Residual Network (WRN-50-2) > 잘 몰라서 더 찾아봐야할듯.
feature > global average pooling
loss
- softmax+center loss
- 논문제목이 metric learning이어서.. softmax ?? 를 사용한 케이스? center loss가 ?
- cam기반 landmark 설명한것을 보니 softmax는 확실 ㅎ

학습

제목그래로 metric learning인줄알았더니..softmax+center loss 조합
curriculum learning 개념으로 학습셋을 구성하여 학습
기본적으로 이 회사에서 만든 학습셋인듯
- (1) Europe (including all Russia);
- (2) North America, Australia and Oceania;
- (3) Middle East, North Africa;
- (4) Far East.

실험

delf 와 같이 통합하여 실험 - spatial verification
다음부터 좀 헷갈린다.
train data ?
- sensitivity — accuracy of a model on images with landmarks (also called Recall)
  - ?? 학습모델을 랜크마크 이미지를 가지고 구성하여 모델 생성
- specificity — accuracy of a model on images without landmarks.
  - ?? 학습모델을 랜크마크 이미지없이 가지고 구성하여 모델 생성.
test data ?
- “rare” and “frequent” landmarks.
  - “Part from total number” : shows what percentage of landmark examples in the offline test
    has the corresponding type of landmarks > offline??몬지는 모르겠지만.ㅎ 전체 수집한것중 잘 나타나지 않는 landmark를 의미하는듯.??
    - 각 클래스를 보고 그 클래스의 갯수를 보고 자른것인가? 그렇다면 반대가 되어야하는데. ㅠ(유명 랜드마크는 그 클래스안에서는 개수는 많아도 전체 지구 개수의 랜드마크의 아주 일부분이기 때문에....?? 그래서 이 개념은 아닌듯) > 데이터를 주지 않으니..ㅠ
    - 특정 영역(geo)을 crawling를 했는데..그때의 전체 이미지중 보니 매우 낮은 랜드마크만 있었서 나머지는 non-landmark 비율이 높은 경우를 'rare'로 하지 않았을까?
      - It’s very important to understand how model works on “rare” landmarks due to the small amount of data for them.
데이터셋구성
- test set > all: 581545 > 이중 약 3%만 landmark 이미지
- 아래 그림은 "rare" vs "frequent" > 그림보고도 이해안감.ㅠ > 대충? 설명
- 아래 그림의 왼쪽은 centriod들과 밀접한 top 5 centriod 이미지 > 그래서, 각 클래스안에 다른 view가 존재하는듯..(위에서 언급)
eval set 평가 = Medium vs Hard > distractors image에 대한 포함여부 테스트
- 평가가 살짝 fair하지 않는듯 > 여기서는 자체 train set를 ... > ( neuralcode같은것을 이용하지 않고..) > 공개했으면..
- neuralcode나 google landmark 데이터셋을 trainset으로한 실험도 같이 했으면 더 좋을것 같다.