Equivariant Multi-View Networks

Abstract

Several popular approaches to 3D vision tasks process multiple views of the input independently with deep neural networks pre-trained on natural images, achieving view permutation invariance through a single round of pooling over all views. We argue that this operation discards important information and leads to subpar global descriptors. In this paper, we propose a group convolutional approach to multiple view aggregation where convolutions are performed over a discrete subgroup of the rotation group, enabling, thus, joint reasoning over all views in an equivariant (instead of invariant) fashion, up to the very last layer. We further develop this idea to operate on smaller discrete homogeneous spaces of the rotation group, where a polar view representation is used to maintain equivariance with only a fraction of the number of input views. We set the new state of the art in several large scale 3D shape retrieval tasks, and show additional applications to panoramic scene classification.

Demo

This repository contains demos for our best models on rotated and aligned ModelNet40.

Check the requirements in requirements.txt

Download the datasets here.

The following commands should

clone this repo,
create a virtualenv,
install the requirements.

git clone https://github.com/daniilidis-group/emvn.git
cd emvn
virtualenv -p python3 env
source env/bin/activate
pip install -r requirements.txt

Training on aligned ModelNet40

Change –data and –logdir appropriately.

python3 emvn/train.py \
        --data /path/to/m40canon_{}.lmdb \
        --logdir /tmp/emvn_m40canon \
        --epochs 15 \
        --batch-size 12 \
        --skip_eval \
        --eval_retrieval \
        --retrieval_include_same \
        --triplet_loss \
        --optimizer nesterov \
        --lr-decay-mode cos \
        --gconv_support 0,8,1,15,12,25,21,19,29,7,11,20,4 \
        --gcc 512,512,512 \
        --n_homogeneous 20 \
        --lr 3e-3 \
        --homogeneous_only1st \
        --n_fc_before_gconv 1 \
        --n_group_elements 60 \
        --pretrained

Sample outputs:

[2019-10-28 15:17:08,566:INFO] Loading data...
[1028 15:17:08 @format.py:93] Found 2468 entries in m40canon_test.lmdb
Classes=40, Views=1
[1028 15:17:09 @format.py:93] Found 9843 entries in m40canon_train.lmdb
[1028 15:17:09 @format.py:93] Found 2468 entries in m40canon_test.lmdb
Loading 100 inputs from pretrained model...
[2019-10-28 15:17:11,027:INFO] Loaded model...
[2019-10-28 15:17:11,027:INFO] Loaded model; params=23522920
[2019-10-28 15:17:14,366:INFO] Running on cuda:0
[2019-10-28 15:17:14,368:INFO] Checkpoint /tmp/emvn_m40canon/latest.pth.tar not found; ignoring.
[2019-10-28 15:17:14,368:INFO] Epoch: [1/15]
        Iter [10/820] Loss: 3.4836 Time batch: 0.2294 s; Time GPU 0.1063 s; Time to load: 0.2292 s
(...)
        Iter [820/820] Loss: 0.0081 Time batch: 0.2310 s; Time GPU 0.1035 s; Time to load: 0.2310 s
[2019-10-28 16:05:08,857:INFO] Time taken: 190.51 sec...
[2019-10-28 16:05:08,858:INFO]  Saving latest model
[2019-10-28 16:05:09,671:INFO] Total time for this epoch: 191.31975293159485 s
Starting evaluation...
        Iter [10/411] Loss: 0.0165  Acc: 100.0000
(...)
        Iter [410/411] Loss: 0.0059  Acc: 100.0000
[2019-10-28 16:07:14,435:INFO] Evaluating retrieval...
[2019-10-28 16:07:15,352:INFO] Computed pairwise distances between 2466 samples
[2019-10-28 16:07:17,191:INFO] acc per class=[100.0, 92.0, 98.0, 85.0, 96.0, 98.0, 100.0, 100.0, 98.0, 95.0, 75.0, 95.0, 98.83720930232558, 95.0, 95.34883720930233, 20.0, 98.0, 100.0, 100.0, 90.0, 100.0, 99.0, 98.0, 81.3953488372093, 100.0, 99.0, 82.0, 95.0, 98.0, 100.0, 98.0, 100.0, 80.0, 97.0, 95.0, 100.0, 90.0, 80.0, 90.0, 100.0]
[2019-10-28 16:07:17,192:INFO]  combined: 92.61, Acc: 94.61, mAP: 93.36, Loss: 0.2238

Results are within 0.2% of Table 2/Ours-R-12 in the paper.

Training on rotated ModelNet40

Change –data and –logdir appropriately.

python3 emvn/train.py \
        --data /path/to/m40rot_{}.lmdb \
        --logdir /tmp/emvn_m40rot \
        --epochs 15 \
        --batch-size 6 \
        --skip_eval \
        --eval_retrieval \
        --retrieval_include_same \
        --triplet_loss \
        --optimizer nesterov \
        --lr-decay-mode cos \
        --lr 1.5e-3 \
        --gconv_support 0,8,1,15,12,25,21,19,29,7,11,20,4 \
        --gcc 512,512,512 \
        --n_fc_before_gconv 1 \
        --n_group_elements 60 \
        --pretrained

Sample outputs:

[2019-10-28 15:20:01,355:INFO] Loading data...
[1028 15:20:01 @format.py:93] Found 2468 entries in m40rot_test.lmdb
Classes=40, Views=1
[1028 15:20:03 @format.py:93] Found 9843 entries in m40rot_train.lmdb
[1028 15:20:03 @format.py:93] Found 2468 entries in m40rot_test.lmdb
Loading 100 inputs from pretrained model...
[2019-10-28 15:20:04,614:INFO] Loaded model...
[2019-10-28 15:20:04,622:INFO] Loaded model; params=21687912
[2019-10-28 15:20:08,627:INFO] Running on cuda:0
[2019-10-28 15:20:08,727:INFO] Checkpoint /tmp/emvn_m40rot/latest.pth.tar not found; ignoring.
[2019-10-28 15:20:08,728:INFO] Epoch: [1/15]
        Iter [10/1640] Loss: 4.7577 Time batch: 0.5615 s; Time GPU 0.2113 s; Time to load: 0.5800 s
(...)
        Iter [1640/1640] Loss: 0.0071 Time batch: 0.3312 s; Time GPU 0.1352 s; Time to load: 0.3332 s
[2019-10-28 17:58:58,240:INFO] Time taken: 541.46 sec...
[2019-10-28 17:58:58,241:INFO]                    
 Starting evaluation...                           
        Iter [10/822] Loss: 0.0001  Acc: 100.0000
(...)
        Iter [820/822] Loss: 2.6854  Acc: 66.6667
[2019-10-28 18:00:35,861:INFO] Evaluating retrieval...
[2019-10-28 18:00:36,760:INFO] Computed pairwise distances between 2466 samples
[2019-10-28 18:00:38,703:INFO] acc per class=[100.0, 86.0, 100.0, 85.0, 97.0, 98.0, 85.0, 99.0, 98.0, 95.0, 75.0, 100.0, 83.72093023255815, 95.0, 82.55813953488372, 45.0, 97.0, 99.0, 100.0, 85.0, 100.0, 98.0, 97.0, 76.74418604651163, 95.0, 98.0, 79.0, 90.0, 88.0, 90.0, 95.0, 100.0, 65.0, 77.0, 90.0, 99.0, 77.0, 84.0, 55.0, 77.77777777777777]
[2019-10-28 18:00:38,704:INFO]  combined: 87.80, Acc: 90.67, mAP: 88.38, Loss: 0.3442

Results are within 0.4% of Table 3/Ours-R-60 in the paper.

Training on aligned SHREC’17

Coming soon!

Training on rotated SHREC’17

Coming soon!

Reference

Carlos Esteves*, Yinshuang Xu*, Christine Allen-Blanchette, Kostas Daniilidis. “Equivariant Multi-View Networks”. The IEEE International Conference on Computer Vision (ICCV), 2019.

@InProceedings{Esteves_2019_ICCV,
author = {Esteves, Carlos and Xu, Yinshuang and Allen-Blanchette, Christine and Daniilidis, Kostas},
title = {Equivariant Multi-View Networks},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}

Authors

Carlos Esteves*, Yinshuang Xu*, Christine Allen-Blanchette, Kostas Daniilidis

GRASP Laboratory, University of Pennsylvania

tfaod/emvn