
Create ResNet

Dept. of Embedded Systems Engineering, Incheon National University

jiho264@inu.ac.kr / jiho264@naver.com

  • The purpose of this project is that to create a ResNet using Pytorch and to get the accuracy of near original paper's one!
  • The Origin ResNet32 have 7.51% top-1 error rate in CIFAR-10 dataset.
  • The Origin ResNet34 have 21.53% top-1 error rate in ImageNet2012 dataset.

    Haven't achieved that yet.

1. Usage

1.1. Requierments

  • Ubuntu 22.04 LTS
  • Python 3.11.5
  • Pytorch 2.2.0
  • CUDA 12.1
  • pip [copy, time, tqdm, matplotlib, etc..]
  • I used i7-9700k, DDR4 64GB, GTX3090

1.2. The Manual from Original Paper

1.2.1. Implementation about training process :

  • We initialize the weights as on He initialization
  • We adopt batch normalization after each convolutional and before activation
  • We use SGD with a mini-batch size of 256
  • The learning rate starts from 0.1 and is divided by 10 when the error plateaus
  • We use a weight decay of 0.0001 and a momentum of 0.9
  • We do not use dropout

1.2.2. MyResNet34 preprocessing for ImageNet2012 :

  • The image is resized with its shorter side randomly sampled in [256, 480] for scale augmentation [41].
  • A 224×224 crop is randomly sampled from an image or its horizontal flip, with the per-pixel mean subtracted [21].
  • The standard color augmentation in [21] is used.

    So, Apply PCA_Color_Augmentation

  • In testing, for comparison studies we adopt the standard 10-crop testing [21]. For best results, we adopt the fully- convolutional form as in [41, 13], and average the scores at multiple scales (images are resized such that the shorter side is in {224, 256, 384, 480, 640}).

    Implemented on src/Prediction_for_MultiScaleTest.ipynb

1.2.3. MyResNet_CIFAR preprocessing for CIFAR10 :

  • 45k/5k train/valid split from origin train set(50k)
  • 4 pixels are padded on each side, and a 32 x 32 crop is randomly sampled from the padded image or its horizontal flip.
  • For testing, use original images

2. Experiments

2.1. ResNet32 Model on CIFAR10

2.1.1. Setup

model = MyResNet_CIFAR(num_classes=10, num_layer_factor = 5, Downsample_option="A").to("cuda")
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[82, 123], gamma=0.1)
earlystopper = EarlyStopper(patience=999, model=model, file_name=file_name)
  • epochs = 180
  • batch = 128
train.transforms = Compose([
    Compose([ToImage(), ToDtype(scale=True)])
    RandomCrop(32, padding=4),
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
test.transforms = Compose([
    Compose([ToImage(), ToDtype(scale=True)])
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),

2.1.2. Result

2.2. Best ResNet34 model on ImageNet2012

2.2.1. Setup

model = MyResNet34(num_classes=1000, Downsample_option="B").to("cuda")
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30, 60], gamma=0.1)
file_name = "MyResNet34_ImageNet2012_rezero"
earlystopper = EarlyStopper(patience=999, model=model, file_name=file_name)
  • epochs = 120
  • batch = 256
# PCAColorAugmentation
class PCAColorAugmentation(object):
    ResNet paper's say; The standard color augmentation in [21] is used.
    - [21] : AlexNet paper.
    - PCA Color Augmentation

    1. Get the eigenvalue and eigenvector of the covariance matrix of the image pixels. (ImageNet2012)
    2. [r, g, b] = [r, g, b] + [p1, p2, p3] matmul [a1 * r1, a2 * r2, a3 * r3].T

    def __init__(self):

        self._eigval = torch.tensor([55.46, 4.794, 1.148]).reshape(1, 3)
        self._eigvec = torch.tensor(
                [-0.5675, 0.7192, 0.4009],
                [-0.5808, -0.0045, -0.8140],
                [-0.5836, -0.6948, 0.4203],

    def __call__(self, _tensor: torch.Tensor):
        Input : torch.Tensor [C, H, W]
        Output : torch.Tensor [C, H, W]
        return _tensor + torch.matmul(
            torch.mul(self._eigval, torch.normal(mean=0.0, std=0.1, size=[1, 3])).T,
        ).reshape(3, 1, 1)
# Training set
train = Compose(
   RandomShortestSize(min_size=range(256, 480), antialias=True),
   Compose([ToImage(), ToDtype(torch.float32, scale=True)]),
      mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=True
# center croped valid set
valid = Compose(
   RandomShortestSize(min_size=range(256, 480), antialias=True),
   # VGG에서 single scale로 했을 때는 두 range의 median 값으로 crop함.
   Compose([ToImage(), ToDtype(torch.float32, scale=True)]),
      mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=True

2.2.2. Result

  • Training & Center Crop Validation

    • alt text
      • Figure 4. Training on ImageNet. Thin curves denote training error, and bold curves denote validation error of the center crops. Right: ResNets of 18 and 34 layers. In this plot, the residual networks have no extra parameter compared to their plain counterparts.
      • 빨간 실선 (Center Crop valid)의 최저 수치는 약 25%가량으로, MyResNet34의 최저 error 27.27%과 비슷함.
  • 10-Crop Testing

    # 10-croped valid set
    scales = [224, 256, 384, 480, 640]
    valid  = Compose(
      RandomShortestSize(min_size=scales[i] + 1, antialias=True),
            [ToImage(), ToDtype(torch.float32, scale=True)]
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=True
    Model is loaded from MyResNet34_ImageNet2012_rezero.pth
    Dataset 224: Loss: 1.282425, Top-1 Acc: 68.80%, Top-5 Acc: 88.47%
    Dataset 256: Loss: 1.183675, Top-1 Acc: 70.91%, Top-5 Acc: 89.78%
    Dataset 384: Loss: 1.306427, Top-1 Acc: 72.76%, Top-5 Acc: 91.09%
    Dataset 480: Loss: 1.581165, Top-1 Acc: 71.49%, Top-5 Acc: 90.47%
    Dataset 640: Loss: 2.098562, Top-1 Acc: 65.77%, Top-5 Acc: 87.33%
    Avg Loss: 1.490451, Avg Top-1 Acc: 69.95%, Avg Top-5 Acc: 89.43%
    • MyResNet34 B :
      • Top-1 acc : 69.95 % (me : 30.05 %)
      • Top-5 acc : 89.43 % (me : 10.57 %)
    • Origin paper ResNet-34 B :
    • Top-1 error : 24.52 %
    • Top-5 error : 7.46 %
    • Center Crop Validation에서는 비슷한 수치를 나타냈는데, 10-Crop Testing에서는 완전한 성능을 내지 못함.
    • Pytorch ResNet34 :

3. Conclusion

  • 논문의 수치에 다다르지 못해 아쉽지만, 어느 정도 구현해본 것에 의의를 둠.
  • 여러 폐기된 실험은 Readme_old.md에서 확인 가능.
  • 알게된 주의사항 :
    • Dataloader.transforms에서 할 일들을 외부에서 처리하는 것은 training에 악영향이 있음.
    • PCA color augmentaion은 데이터셋 전체에 대해 eigvec, eigval을 구한 뒤, N(0, 0.01^2) 분포에 근거해 적절히 증감.
    • 모델의 안정성과 완성도가 확보되지 않은 상황에서 AutoAugment 사용 지양.
    • Train에서 Submean 했으면, Test에서도 당연히 해야함. 어느 정도 Test 수치가 잘 나오긴 하지만, 완벽한 숫자가 아님.