Chapter - 7 RCNN
yesmkaran opened this issue · 1 comments
In the following code, I don't really understand why the candidates are resized along with delta and rois with width and height -
FPATHS, GTBBS, CLSS, DELTAS, ROIS, IOUS = [], [], [], [], [], []
N = 500
for ix, (im, bbs, labels, fpath) in enumerate(ds):
if(ix==N):
break
H, W, _ = im.shape
candidates = extract_candidates(im)
candidates = np.array([(x,y,x+w,y+h) for x,y,w,h in candidates]) // This line of code
ious, rois, clss, deltas = [], [], [], []
ious = np.array([[extract_iou(candidate, _bb_) for candidate in candidates] for _bb_ in bbs]).T
for jx, candidate in enumerate(candidates):
cx,cy,cX,cY = candidate
candidate_ious = ious[jx]
best_iou_at = np.argmax(candidate_ious)
best_iou = candidate_ious[best_iou_at]
best_bb = _x,_y,_X,_Y = bbs[best_iou_at]
if best_iou > 0.3:
clss.append(labels[best_iou_at])
else:
clss.append('background')
delta = np.array([_x-cx, _y-cy, _X-cX, _Y-cY]) / np.array([W,H,W,H]) // This line of code
deltas.append(delta)
rois.append(candidate / np.array([W,H,W,H])) // This line of code
candidates = np.array([(x,y,x+w,y+h) for x,y,w,h in candidates]) // This line of code
is important because we are creating a bbox of (x1, y1, x2, y2) instead of using width and height for easier cropping
delta = np.array([_x-cx, _y-cy, _X-cX, _Y-cY]) / np.array([W,H,W,H]) // This line of code
all our regression based predictions need to be between [0,1] so that sigmiod activation will work efficiently
rois.append(candidate / np.array([W,H,W,H])) // This line of code
rois are used elsewhere in the code which require the values to be fractions of image width & height..
hope these answer your doubts..