Sandal
UT zappos dataset has shoe images collected from Zappos.com. There are several types of labels on the dataset but the one we use in this study are the following classes in the dataset:
- Shoes
- Boots
- Sandals
- Slippers
We use VGG pretrained model and add a softmax to have a 4 neuron layer on the top. We finetune the dataset over 10 epochs, with the learning rate for the lower layers fixed to 1e-6
while the top linear layer is fixed to 1e-3
with weight decay.
At the end of ten epochs, we have the following confusion matrix. 52.735%
for shoes is low but the zappos dataset has a lot of cross-pollination between boots, sandals, slippers and shoes so it is just an artifact of the dataset.
ConfusionMatrix:
[[ 1591 235 627 564] 52.735% [class: 1]
[ 230 965 30 58] 75.214% [class: 2]
[ 66 8 445 55] 77.526% [class: 3]
[ 11 8 15 94]] 73.438% [class: 4]
+ average row correct: 69.728121161461%
+ average rowUcol correct (VOC measure): 39.540689624846%
+ global correct: 61.87524990004%
To predict for an image do:
~/zappos_classify$ th predict.lua -im ~/datasets/zappos/ut-zap50k-image
s/Sandals/Heel/Eric\ Michael/7655272.3.jpg
Finished loaded model
***********************************************
Predict: sandals
********* Details **********
Prediction time 0.16613912582397 seconds.
Classes: [shoes, boots, sandals, slippers]
-2.7951 -6.1849 -0.0861 -3.9458
[torch.CudaTensor of size 1x4]
Zappos dataset.
VGG models from modelzoo here
VGG input mean in torch format here
A bit unclean right now. Using a utility like find
, place the files belonging to shoes into "shoes.txt"
, those from sandals into "sandals.txt"
, boots into "boots.txt"
and slippers into "slippers.txt"
. Start training.
Download trained model from here and input mean from here. Use predict.lua
to predict for an image.