$ python main.py --depth 29 --cardinality 4 --base_channels 64 --outdir results
Model | Test Error (median of 3 runs) | Test Error (in paper) | Training Time |
---|---|---|---|
ResNeXt-29 4x64d | 3.89 | ~3.75 (from Figure 7) | 31h17m |
ResNeXt-29 8x64d | 3.97 (1 run) | 3.65 (average of 10 runs) | 42h50m* |
ResNeXt-29 16x64d | 3.58 (average of 10 runs) |
NOTE Both models were trained with single GPU. ResNeXt-29 4x64d was trained with batch size 32 and initial learning rate 0.025. ResNeXt-29 8x64d was trained with batch size 64 and initial learning rate 0.05. In the paper, they use 8 GPUs, batch size 128 and initial learning rate 0.1.
- Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1492-1500. arXiv:1611.05431, Torch implementation