DSOD Low mAP
AdamCuellar opened this issue · 1 comments
AdamCuellar commented
Not necessarily an issue, but the mAP I got from DSOD512 training on VOC 07+12 and testing on 07 was quite low, approximately 0.13.
Only thing I really changed was using Adam instead of AdamAccumulate because it throws an error on tf 2.0. I also used softmax.
Also, metrics don't show during training other than the loss itself.
def trainMultiGPU():
# set up data sets
gt_util_voc = GTUtility("data/VOC2012train/")
gt_util_voc7 = GTUtility("data/VOC2007train/")
gt_util_voc_val = GTUtility("data/VOC2012val/", validation=True)
gt_util_voc7_val = GTUtility("data/VOC2007val/", validation=True)
gt_util_train = GTUtility.merge(gt_util_voc, gt_util_voc7)
gt_util_val = GTUtility.merge(gt_util_voc_val, gt_util_voc7_val)
experiment = 'dsod300_voc12_7'
batch_size = 16
# class_weights = prior_util.compute_class_weights(gt_util_train)
class_weights = np.array(
[0.00007169, 1.20864663, 1.23607288, 0.81087541, 1.32018959, 1.65339534, 1.47852761, 0.45099343, 0.84154551,
0.33765636, 1.41315118, 1.32907548, 0.63492811, 1.15680594, 1.18978997, 0.07548318, 0.91531396, 1.21262288,
1.15910985, 1.49269817, 1.08304682])
# DSOD paper
# batch size 128
# 320k iterations
# initial learning rate 0.1
epochs = 1000
initial_epoch = 0
with tf.device("/cpu:0"):
# set up DSOD 512
model = DSOD512(num_classes=gt_util_train.num_classes, softmax=True)
prior_util = PriorUtil(model)
gen_train = InputGenerator(gt_util_train, prior_util, batch_size, model.image_size, augmentation=True)
gen_val = InputGenerator(gt_util_val, prior_util, batch_size, model.image_size, augmentation=True)
# weight decay
regularizer = keras.regularizers.l2(5e-4) # None if disabled
for l in model.layers:
if l.__class__.__name__.startswith('Conv'):
l.kernel_regularizer = regularizer
checkdir = './checkpoints/' + time.strftime('%Y%m%d%H%M') + '_' + experiment
if not os.path.exists(checkdir):
os.makedirs(checkdir)
optim = keras.optimizers.Adam(lr=1e-3)
# loss = SSDLoss(alpha=1.0, neg_pos_ratio=3.0)
loss = SSDFocalLoss(lambda_conf=1.0, class_weights=class_weights)
model = multi_gpu_model(model, gpus=2)
model.compile(optimizer=optim, loss=loss.compute, metrics=loss.metrics)
# add some callbacks
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
early_stopping = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)
history = model.fit(
gen_train.generate(),
steps_per_epoch=gen_train.num_batches,
epochs=epochs,
verbose=1,
callbacks=[
keras.callbacks.ModelCheckpoint(checkdir + '/weights.{epoch:03d}.h5', verbose=1, save_weights_only=True,
save_best_only=True, period=3),
Logger(checkdir),
reduce_lr,
early_stopping
],
validation_data=gen_val.generate(),
validation_steps=gen_val.num_batches,
class_weight=None,
workers=1,
use_multiprocessing=False,
initial_epoch=initial_epoch)
mvoelk commented
I had convergence issues with small batch size and was forced to use AdamAccumulate
. The initial learning rate of 0.1 and the batch size of 128 were already suspicious to me.
The missing metricas are a known issue. They are more or less a hack and do not work with tf-keras and probably not with multi GPU either. I did not have the time to fix the tf 2 training.