Meta-contrastive Learning with Support-based Query Interaction for Few-shot Fine-grained Visual Classification

Few-shot fine-grained visual classification combines the challenge of scarcity of training data for few-shot tasks and intrinsic property with inter-intra variance for fine-grained tasks. To alleviate this problem, we first propose a support-based query interaction module and a meta-contrastive method. Extensive experiments demonstrate that our method performs well on both common fine-grained datasets and few-shot datasets in different experimental settings.

Our Major Contributions

We propose a novel meta-contrastive framework for finegrained few-shot learning, further improving the performance of few-shot learning and fine-grained tasks. This may be the first to apply contrastive learning to finegrained few-shot tasks.

We develop a support-based query interaction method that not only establishes the dependencies between support samples and query samples, but also enhances query features by leveraging support features.

We conduct comprehensive experiments on three benchmark fine-grained datasets and a common few-shot benchmark dataset. Our experimental results achieve a SOTA performance on these datasets.

Architecture of our proposed Network

The network is composed of three main parts, including the feature extractor module, the support-based query interaction module (SQIM), and the meta-contrastive module (MCM). Among them, SQIM and MCM are the core of the proposed method. Indeed, SQIM mines latent knowledge by interacting with query features and support features. MCM enhances the model's learning ability to perform fine-grained few-shot tasks by incorporating the concept of contrastive learning.

Support-based Query Interaction Module

The detailed flow of support-based query interaction module.

Meta-Contrastive Module

The detailed flow of meta-contrastive module.

Environment Requirements

To run the project, you need the following environment configuration

numpy==1.21.5
pandas==1.2.4
resnet==0.1
torch==1.10.2
torchvision==0.11.3
tqdm==4.66.1

Dataset

download the dataset from http://www.vision.caltech.edu/visipedia/CUB-200-2011.html

Caltech-UCSD Birds-200-2011 (CUB-200-2011) is an extended version of the CUB-200 dataset, with roughly double the number of images per class and new part location annotations.

Number of categories: 200

Number of images: 11,788

Annotations per image: 15 Part Locations, 312 Binary Attributes, 1 Bounding Box

Or use the Linux command line to download the dataset：

wget http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz 
tar -xvz CUB_200_2011.tgz

Code Structure

dataset/*: Load the dataset and process the dataset

models/*: Define all network models to be used

losses.py: Calculation of losses

train_contrastive.py: Training on the Comparison Module

Result

5-way few-shot classification performance (%) on the CUB, NAB, and Stanford dogs datasets. The ± denotes that the results are reported with 95% confidence intervals over 1000 test episodes. The highest average accuracy of each column is marked in bold.

GZU-SAMLab/McQNet