- Question: Given a photo, can we recognize the correct landmarks it contains?
An object categorization problem in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training images.
- A few training images for each object/class
- Potentially large amount of objects/classes
- Face/Item Recognition/Verification
- Street-to-Shop Systems
- Landmark Recognition
- Google Landmark Recognition Dataset
- 1,225,029 training images with 14,951 landmarks
- 117,703 test images
- Image aren't evenly distributed
- Subset Dataset for this work
- 113,783 training images with 14,943 different landmarks
- 22,255 validation images with 7675 different landmarks
- 22,391 test images with 14436 different landmarks
- Fine-tuning with pre-trained models
- VGG16, InceptionV3, and ResNet have been well trained
- Lower layers usually encode more generic, reusable features
- Higher layers encode more specialized features
- Freeze lower layers and only train the top several layers
- With fine-tuned InceptionV3 Triplet Network, top 1 accuracy is 47%