ShiqiYu/libfacedetection.train

GPU利用率低

sunny-sjj opened this issue · 7 comments

为什么我训练的时候利用率很低呢,我的cuda环境也配置好了

目前I/O方面效率比较低,我们正在处理这个问题。其次模型也比较小,综合导致利用率较低。
训练时添加OMP_NUM_THREADS=2可提升训练速度,即OMP_NUM_THREADS=2 python train.py。>=2也可以。

嗯嗯,好的,谢谢

我还有个问题想问一下您。在您提供的multibox_loss.py的文件内,在计算eiou_loss的时候: loc_t这个是不是没有给他分配target的值啊。还是我理解的有问题。

loss_bbox_eiou = eiou_loss(loc_p[:, 0:4], loc_t[:, 0:4], variance=self.variance, smooth_point=self.smooth_point, reduction='sum')

这一行的loc_p是网络的预测值,loc_t是真实标签。我们用eiou来计算预测框与真实框之间的距离。

嗯嗯,是的。我理解的是loc_t应该等于这里的truths,但是我在代码中没有看到赋值操作

truths = targets[idx][:, 0:14].data

在代码中local_t只在这里赋值

loc_t = torch.Tensor(num, num_priors, 14)

请问是我的理解有问题吗?谢谢回复

# match priors (default boxes) and ground truth boxes
loc_t = torch.Tensor(num, num_priors, 14)
conf_t = torch.LongTensor(num, num_priors)
iou_t = torch.Tensor(num, num_priors)
for idx in range(num):
truths = targets[idx][:, 0:14].data
labels = targets[idx][:, -1].data
defaults = priors.data
iou_t[idx] = match(self.threshold, truths, defaults, self.variance, labels, loc_t, conf_t, idx)
iou_t = iou_t.view(num, num_priors, 1)

第74行的match函数负责将anchor和gt匹配,匹配的gt会写入到loc_t里面(第148行):

def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx):
"""Match each prior box with the ground truth box of the highest jaccard
overlap, encode the bounding boxes, then return the matched indices
corresponding to both confidence and location preds.
Args:
threshold: (float) The overlap threshold used when mathing boxes.
truths: (tensor) Ground truth boxes, Shape: [num_obj, num_priors].
priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
variances: (tensor) Variances corresponding to each prior coord,
Shape: [num_priors, 4].
labels: (tensor) All the class labels for the image, Shape: [num_obj].
loc_t: (tensor) Tensor to be filled w/ endcoded location targets.
conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
idx: (int) current batch index
Return:
The matched indices corresponding to 1)location and 2)confidence preds.
"""
# jaccard index
overlaps = jaccard(
truths,
point_form(priors)
)
# (Bipartite Matching)
# [1,num_objects] best prior for each ground truth
best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
# ignore hard gt
valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
best_prior_idx_filter = best_prior_idx[valid_gt_idx, :]
if best_prior_idx_filter.shape[0] <= 0:
loc_t[idx] = 0
conf_t[idx] = 0
return torch.zeros((1, priors.shape[0]))
# [1,num_priors] best ground truth for each prior
best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
best_truth_idx.squeeze_(0)
best_truth_overlap.squeeze_(0)
best_prior_idx.squeeze_(1)
best_prior_idx_filter.squeeze_(1)
best_prior_overlap.squeeze_(1)
best_truth_overlap.index_fill_(0, best_prior_idx_filter, 2) # ensure best prior
# TODO refactor: index best_prior_idx with long tensor
# ensure every gt matches with its prior of max overlap
for j in range(best_prior_idx.size(0)):
best_truth_idx[best_prior_idx[j]] = j
matches = truths[best_truth_idx] # Shape: [num_priors,14]
conf = labels[best_truth_idx] # Shape: [num_priors]
conf[best_truth_overlap < threshold] = 0 # label as background
loc = encode(matches, priors, variances)
loc_t[idx] = loc # [num_priors,14] encoded offsets to learn
conf_t[idx] = conf # [num_priors] top class label for each prior
return best_truth_overlap

啊啊啊,原来藏在这里,非常感谢您的回复。