Scene classification is a task of AI Challenger competition. This is my implementation with PyTorch.
I used 5 models:
- ResNet50 pretrained on places365
- SE-ResNet50 pretrained on ResNet50 above
- DenseNet161 pretrained on places365
- ResNeXt101 pretrained on ImageNet
- IncetpionV4 pretrained on ImageNet
Pretrain is really important. A good pretrained model can bring you a high baseline. I'm not good at Caffe so I just fine tuned ResNet50, DenseNet161 from Places365-CNNs(Because they have PyTorch weights). In fact ResNet152-places365(Caffe model) is also a good model for this task.
I train these models in different scale, for they have different capacity, big model with larger resolution.High resolution image contains more information, so it makes big model preform better.Another important reason is that models trained on different resolution are also strongly complementary to each other. So this is the main method I used in this competition.
In the ensemble stage, I added the logits(outputs of the fully connected layer, before softmax) directly to get the final prediction. Because I found in this way the logits can reserve more information than the softmax probability. In this way the score is aways higer than the score ensembled from softmax probability.
Also, some strategies like data augmentation are used:
- Scale jittering
- Color jittering
- Random Erasing
- Label shuffling
- Label smoothing
You need to download image data from the competition website,
and put the training image data into 80 folders according to their labels,
then modify the path in file dataset.py
. Then download pretrained model weights from:
CSAILVision/places365,
and convert it into 80 classes output with models.convert_pretrained_model.py
.
Check the model weights path and dataset path in dataset.py
and train.py
,
then run the train.py
:
python train.py \
--model_name=resnet50 \
--optim=Adam \
--learning_rate=1e-3 \
--num_epochs=10
Now with the default setting --retrain=False
, it will only train the fully connected layer as a classifier.
With setting --retrain=True
, you can train all the parameters in the model, or you could modify the function get_trainable_variables()
in train.py
, like:
trainable_scope = [model.fc, model.layer4, model.layer3]
then you can just train the part you want to.
Model Name | Image Size | 12 Crop Size | Single Crop | MultiScale 12 Crop |
---|---|---|---|---|
ResNet50 | 224 | [224,256,320,384] | 95.44% | 96.39% |
IncetpionV4 | 299 | [299, 320, 352, 384] | 94.99% | 96.12% |
SE-ResNet50 | 320 | [320, 352, 384, 416, 480] | 95.84% | 96.34% |
ResNeXt101 | 320 | [320, 352, 384, 448, 480] | 94.51% | 95.66% |
DenseNet161 | 384 | [384, 416, 448, 480, 512] | 96.07% | 96.67% |
Ensemble | --- | --- | --- | 97.388% |
Dataset | Top 3 | Rank |
---|---|---|
Test A | 97.202% | 46th |
Test B | 93.120% | 36th |
- PyTorch >= 0.2.0
- tqdm as progress bar
- TensorFlow (Either CPU or GPU version) to use TensorBoard in PyTorch. If you don't want to use tensorboard, just delete
from logger import Logger
intrain.py
.