
Single Shot MultiBox Detector (SSD) Implementation with PyTorch

PyTorch SSD

The implementation of SSD (Single shot detector) in PyTorch.

  • Implement SSD300
  • Implement SSD300 with batch normalization
  • Implement SSD512
  • Implement SSD512 with batch normalization
  • Visualize inference result
  • Arg parse (easy training)
  • Share pre-trained weights →SSD300's model has shared partially!
  • Well-introduction?
  • Support COCO Dataset
  • Support Custom Dataset
  • Speed up
  • mAP (I have no confidence...)

Requirements and Settings

  • Anaconda

    conda install -c anaconda pycurl
    conda install -c pytorch pytorch
    conda install -c conda-forge numpy opencv ffmpeg scipy jupyter_contrib_nbextensions jupyter_nbextensions_configurator pycocotools
  • pip (optional)

    pip install git+https://github.com/jjjkkkjjj/pytorch_SSD.git
  • Jupyter

    jupyter notebook


How to start

Get VOC and COCO Dataset

  • You can download VOC2007-trainval, VOC2007-test, VOC2012-trainval, VOC2012-test, COCO2014-trainval and COCO2014-test dataset following command;

    python get_dataset.py --datasets [{dataset name} {dataset name}...]

    {dataset name} is;

    • voc2007_trainval
    • voc2007_test
    • voc2012_trainval
    • voc2012_test
    • coco2014_trainval
    • coco2017_trainval

Easy training

You can train (your) voc or coco style dataset easily when you use easy_train.py!


python easy_train.py VOC -r {your-voc-style-dataset-path} --focus trainval -l ball person -lr 0.003


python easy_train.py COCO -r {your-coco-style-dataset-path} --focus train2012 -l ball person -lr 0.003
usage: easy_train.py [-h] [-r DATASET_ROOTDIR [DATASET_ROOTDIR ...]]
                     [--focus FOCUS [FOCUS ...]] [-l LABELS [LABELS ...]]
                     [-ig [{difficult,truncated,occluded,iscrowd} [{difficult,truncated,occluded,iscrowd} ...]]]
                     [-m {SSD300,SSD512}] [-n MODEL_NAME] [-bn]
                     [-w WEIGHTS_PATH] [-bs BATCH_SIZE] [-nw NUM_WORKERS]
                     [-d {cpu,cuda}] [-si START_ITERATION] [-na]
                     [-optimizer {SGD,Adam}] [-lr LEARNING_RATE]
                     [--momentum MOMENTUM] [-wd WEIGHT_DECAY]
                     [--steplr_gamma STEPLR_GAMMA]
                     [--steplr_milestones STEPLR_MILESTONES [STEPLR_MILESTONES ...]]
                     [-mi MAX_ITERATION] [-ci CHECKPOINTS_INTERVAL]
                     [--loss_alpha LOSS_ALPHA]

Easy training script for VOC or COCO style dataset

positional arguments:
  {VOC,COCO}            Dataset type

optional arguments:
  -h, --help            show this help message and exit
                        Dataset root directory path. If dataset type is 'VOC',
                        Default is; '['/home/kado/data/voc/voc2007/trainval/VO
                        Cdevkit/VOC2007']' If dataset type is 'COCO', Default
                        is; '['/home/kado/data/coco/coco2014/trainval']'
  --focus FOCUS [FOCUS ...]
                        Image set name. If dataset type is 'VOC', Default is;
                        '['trainval']' if dataset type is 'COCO', Default is;
  -l LABELS [LABELS ...], --labels LABELS [LABELS ...]
                        Dataset class labels. If dataset type is 'VOC',
                        Default is; '['aeroplane', 'bicycle', 'bird', 'boat',
                        'bottle', 'bus', 'car', 'cat', 'chair', 'cow',
                        'diningtable', 'dog', 'horse', 'motorbike', 'person',
                        'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']'
                        If dataset type is 'COCO', Default is; '['person',
                        'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
                        'train', 'truck', 'boat', 'traffic light', 'fire
                        hydrant', 'stop sign', 'parking meter', 'bench',
                        'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
                        'elephant', 'bear', 'zebra', 'giraffe', 'backpack',
                        'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
                        'skis', 'snowboard', 'sports ball', 'kite', 'baseball
                        bat', 'baseball glove', 'skateboard', 'surfboard',
                        'tennis racket', 'bottle', 'wine glass', 'cup',
                        'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
                        'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog',
                        'pizza', 'donut', 'cake', 'chair', 'couch', 'potted
                        plant', 'bed', 'dining table', 'toilet', 'tv',
                        'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
                        'microwave', 'oven', 'toaster', 'sink',
                        'refrigerator', 'book', 'clock', 'vase', 'scissors',
                        'teddy bear', 'hair drier', 'toothbrush']'
  -ig [{difficult,truncated,occluded,iscrowd} [{difficult,truncated,occluded,iscrowd} ...]], --ignore [{difficult,truncated,occluded,iscrowd} [{difficult,truncated,occluded,iscrowd} ...]]
                        Whether to ignore object
  -m {SSD300,SSD512}, --model {SSD300,SSD512}
                        Trained model
  -n MODEL_NAME, --model_name MODEL_NAME
                        Model name, which will be used as save name
  -bn, --batch_norm     Whether to construct model with batch normalization
  -w WEIGHTS_PATH, --weights_path WEIGHTS_PATH
                        Pre-trained weights path. Default is pytorch's pre-
                        trained one for vgg
  -bs BATCH_SIZE, --batch_size BATCH_SIZE
                        Batch size
  -nw NUM_WORKERS, --num_workers NUM_WORKERS
                        Number of workers used in DataLoader
  -d {cpu,cuda}, --device {cpu,cuda}
                        Device for Tensor
  -si START_ITERATION, --start_iteration START_ITERATION
                        Resume training at this iteration
  -na, --no_augmentation
                        Whether to do augmentation to your dataset
  -optimizer {SGD,Adam}
                        Optimizer for training
  -lr LEARNING_RATE, --learning_rate LEARNING_RATE
                        Initial learning rate
  --momentum MOMENTUM   Momentum value for Optimizer
  -wd WEIGHT_DECAY, --weight_decay WEIGHT_DECAY
                        Weight decay for SGD
  --steplr_gamma STEPLR_GAMMA
                        Gamma for stepLR
                        Milestones for stepLR
  -mi MAX_ITERATION, --max_iteration MAX_ITERATION
                        Checkpoints interval
  --loss_alpha LOSS_ALPHA
                        Loss's alpha
  • Caution!!

    When your terminal window is small, print training summary for each iteration


    To avoid this, please expand your terminal window.


Script Example


See also training-voc2007+2012.ipynb or training-voc2007.ipynb.

  • First, create augmentation, transform, target_transform instance using augmentations, transforms and target_transforms module in data


    from ssd_data import transforms, target_transforms, augmentations
    ignore = target_transforms.Ignore(difficult=True)
    augmentation = augmentations.AugmentationOriginal()
    transform = transforms.Compose(
        [transforms.Resize((300, 300)),
         transforms.Normalize(rgb_means=(0.485, 0.456, 0.406), rgb_stds=(0.229, 0.224, 0.225))]
    target_transform = target_transforms.Compose(
         target_transforms.OneHot(class_nums=datasets.VOC_class_nums, add_background=True),

Note that None is available to set these instances

  • Second, load dataset from datasets module in data.


    from ssd_data import datasets
    from ssd_data import _utils
    train_dataset = datasets.VOC2007Dataset(ignore=ignore, transform=transform, target_transform=target_transform, augmentation=augmentation)
    train_loader = DataLoader(train_dataset,

    You can use datasets.Compose to combine plural datasets.

  • Third, create model. You can create model with specified device by .to(device)


    from ssd.models.ssd300 import SSD300
    model = SSD300(class_labels=train_dataset.class_labels, batch_norm=False).cuda()

    You can load your trained weights by using model.load_weights(path) too.

  • Last, create Optimizer, SaveManager, LogManager and TrainLogger to train.


    from torch.utils.data import DataLoader
    from torch.optim.sgd import SGD
    from torch.optim.adam import Adam
    from ssd.train import *
    optimizer = SGD(model.parameters(), lr=1e-3, momentum=0.9, weight_decay=5e-4) # slower
    #optimizer = Adam(model.parameters(), lr=1e-3, weight_decay=5e-4) # faster
    iter_sheduler = SSDIterMultiStepLR(optimizer, milestones=(40000, 50000), gamma=0.1, verbose=True)
    save_manager = SaveManager(modelname='ssd300-voc2007', interval=5000, max_checkpoints=15, plot_yrange=(0, 8))
    log_manager = LogManager(interval=10, save_manager=save_manager, loss_interval=10, live_graph=LiveGraph((0, 8)))
    trainer = TrainLogger(model, loss_func=SSDLoss(), optimizer=optimizer, scheduler=iter_sheduler, log_manager=log_manager)
    trainer.train(60000, train_loader)
  • Result Learning curve example(voc2007-trainval and voc2007-test)learning curve07

    Learning curve example(voc2007-trainval and voc2012-trainval) learning curve07+12


  • First, create model. You can create model with specified device by .to(device)


    from ssd.models.ssd300 import SSD300
    from ssd_data import datasets
    model = SSD300(class_labels=datasets.VOC_class_labels, batch_norm=False).cuda()
    model.eval() ## Required!!!
  • Pass image and show.


    # must be passed RGB order
    image = cv2.cvtColor(cv2.imread('assets/coco_testimg.jpg'), cv2.COLOR_BGR2RGB)
    # imgs is list of ndarray(img)
    infers, imgs = model.infer(cv2.resize(image, (300, 300)), visualize=True, toNorm=True)
    for img in imgs: 
        # returned img order is BGR
        cv2.imshow('result', cv2.cvtColor(img, cv2.COLOR_RGB2BGR))


Pre-trained Weights

※mAP for voc2007test

SSD300 (no batchnormed) SSD512 (no batchnormed)
VOC2007 mAP: 0.7572 mAP:
VOC2007++ mAP: N/A mAP:
VOC2007+2012 mAP: 0.7636 mAP:
VOC2007+2012+COCO mAP: 0.7682 mAP:

About SSD

  • Default Box in SSD300 has been implemented in dbox.py


