A unofficial PyTorch implementation of MoCo.
There are some difficult with official implementation, only using model ResNet18 and ResNet50 in this repo and training pretrained model on one v100 GPU, no Shuffling BN.
$ conda activate env
$ pip install -r requirements.txt
$ python main_train.py --model resnet18 --cos
main_train.py [-h] [--dataset DATASET] [--epochs EPOCHS]
[--start-epoch START_EPOCH] [--batch-size BATCH_SIZE]
[--lr LR] [--schedule [SCHEDULE [SCHEDULE ...]]]
[--momentum MOMENTUM] [--wd WD] [--checkpoint CHECKPOINT]
[--workers WORKERS] [--cos] [--device DEVICE]
[--model MODEL] [--moco-dim MOCO_DIM] [--moco-k MOCO_K]
[--moco-m MOCO_M] [--moco-t MOCO_T]
Pytorch MocoV2 training
optional arguments:
-h, --help show this help message and exit
--dataset DATASET name for dataset, (Options: cifar, stl)
--epochs EPOCHS Number of epochs in training
--start-epoch START_EPOCH
manual epoch number (useful on restarts)
--batch-size BATCH_SIZE
Number of batch size
--lr LR learning rate
--schedule [SCHEDULE [SCHEDULE ...]]
learning rate schedule (drop 0.8)
--momentum MOMENTUM momentum for SGD
--wd WD weight decay
--checkpoint CHECKPOINT
path to latest checkpoint
--workers WORKERS Number of dataloader workers
--cos using cosine lr schedule
--device DEVICE device for training
--model MODEL model, (Options: resnet18, resnet50, resnet50x2d,
resnet50x4d)
--moco-dim MOCO_DIM feature dimension
--moco-k MOCO_K size fo queue, number of negative keys
--moco-m MOCO_M momentum for key encoder
--moco-t MOCO_T temperature in InfoNCE
$ python main_cls.py --model resnet18 --lr 0.3 --pretrained PATH
main_cls.py [-h] [--dataset DATASET] [--epochs N] [--start-epoch N]
[--batch-size N] [--lr LR]
[--schedule [SCHEDULE [SCHEDULE ...]]] [--momentum M]
[--wd WD] [--checkpoint PATH] [--workers N]
[--device DEVICE] [--model MODEL] [--pretrained PATH]
[--dim N]
Pytorch MocoV2 linear classification
optional arguments:
-h, --help show this help message and exit
--dataset DATASET name for dataset, (Options: cifar, stl)
--epochs N Number of epochs in training
--start-epoch N manual epoch number (useful on restarts)
--batch-size N Number of batch size
--lr LR learning rate
--schedule [SCHEDULE [SCHEDULE ...]]
learning rate schedule (drop ratio)
--momentum M momentum for SGD
--wd WD weight decay
--checkpoint PATH path to latest checkpoint
--workers N Number of dataloader workers
--device DEVICE device for training
--model MODEL model, (Options: resnet18, resnet50, resnet50x2d,
resnet50x4d)
--pretrained PATH path to moco pretrained checkpoint
--dim N number of classification
We train encoder by using resnet18 and resnet50, with dataset CIFAR10 and STL10, optimizer SGD. And We freeze all parameters but fc layer of resent model to training a linear classifier evaluating our model.
This is the performance:
Dataset | Architecture | Queue size | Feature dimensions | Epochs | Linear epochs | Top1 % | Top5 % |
---|---|---|---|---|---|---|---|
CIFAR10 | ResNet18 | 4096 | 128 | 500 | 100 | 81.06 | 99.65 |
STL10 | ResNet18 | 4096 | 128 | 500 | 100 | 80.57 | 99.43 |
STL10 | ResNet18 | 65536 | 128 | 1000 | 100 | 78.54 | 99.51 |
CIFAR10 | ResNet50 | 4096 | 128 | 500 | 100 | 84.03 | 99.40 |
CIFAR10 | ResNet50 | 16384 | 128 | 500 | 100 | 84.57 | 99.43 |
STL10 | ResNet50 | 4096 | 128 | 500 | 100 | 84.06 | 99.76 |