PacktPublishing/Modern-Computer-Vision-with-PyTorch

Chapter - 7 RCNN

yesmkaran opened this issue · 1 comments

In the following code, I don't really understand why the candidates are resized along with delta and rois with width and height -

FPATHS, GTBBS, CLSS, DELTAS, ROIS, IOUS = [], [], [], [], [], []
N = 500
for ix, (im, bbs, labels, fpath) in enumerate(ds):
  if(ix==N):
    break

  H, W, _ = im.shape
  candidates = extract_candidates(im)
  candidates = np.array([(x,y,x+w,y+h) for x,y,w,h in candidates])       // This line of code
  ious, rois, clss, deltas = [], [], [], []
  ious = np.array([[extract_iou(candidate, _bb_) for candidate in candidates] for _bb_ in bbs]).T

  for jx, candidate in enumerate(candidates):
    cx,cy,cX,cY = candidate
    candidate_ious = ious[jx]
    best_iou_at = np.argmax(candidate_ious)
    best_iou = candidate_ious[best_iou_at]
    best_bb = _x,_y,_X,_Y = bbs[best_iou_at]
    if best_iou > 0.3: 
      clss.append(labels[best_iou_at])
    else: 
      clss.append('background')
    delta = np.array([_x-cx, _y-cy, _X-cX, _Y-cY]) / np.array([W,H,W,H])       // This line of code
    deltas.append(delta)
    rois.append(candidate / np.array([W,H,W,H]))        // This line of code

candidates = np.array([(x,y,x+w,y+h) for x,y,w,h in candidates]) // This line of code
is important because we are creating a bbox of (x1, y1, x2, y2) instead of using width and height for easier cropping

delta = np.array([_x-cx, _y-cy, _X-cX, _Y-cY]) / np.array([W,H,W,H]) // This line of code
all our regression based predictions need to be between [0,1] so that sigmiod activation will work efficiently

rois.append(candidate / np.array([W,H,W,H])) // This line of code
rois are used elsewhere in the code which require the values to be fractions of image width & height..

hope these answer your doubts..