ubc-vision/COTR

patch partition?

Closed this issue · 5 comments

zbc-l commented

Thank you for such an excellent job. I have some questions about cotr. During the training process, do you divide the scene images into 256*256 patches according to certain rules after scaling and then input them into the network for training? (I'm not sure where this step is implemented in the program.) How is corrs partitioned? Will it be the case that the corresponding point is divided into the next patch? How should this be handled? Is the validation process also similar to the training process after the split iteration.

  1. Cropping and scaling is done inside the dataloader:
    seed_corr = self.get_seed_corr(nn_cap, query_cap)
    if seed_corr is None:
    return self.__getitem__(random.randint(0, self.__len__() - 1))
    # crop cap
    s = np.random.choice(self.zooms)
    nn_zoom_cap = self.get_zoomed_cap(nn_cap, seed_corr[:2], s, 0)
    query_zoom_cap = self.get_zoomed_cap(query_cap, seed_corr[2:], s, self.zoom_jitter)
    assert nn_zoom_cap.shape == query_zoom_cap.shape == (constants.MAX_SIZE, constants.MAX_SIZE)
    corrs = self.get_corrs(query_zoom_cap, nn_zoom_cap)
    if corrs is None or corrs.shape[0] < self.num_kp:
    return self.__getitem__(random.randint(0, self.__len__() - 1))
    shuffle = np.random.permutation(corrs.shape[0])
    corrs = np.take(corrs, shuffle, axis=0)
    corrs = self._trim_corrs(corrs)
  2. Validation data is similar to training data, both are cropped and scaled.
zbc-l commented

Thank you for your patient reply. I noticed that during the training process, queries and targets are pairs of query points of images a and b, but they are concatenated in reverse order, which means that in the model prediction process, all queries of a and b need to be input to predict.
But I noticed that in the demo, you could enter queries_a for one of the images and get queries_b for b images.
I am confused about the role of the variables corr_a, con_a, loc_from, and loc_to.

corr_a, con_a, resample_a, corr_b, con_b, resample_b = cotr_flow(self.model,
img_a_sq,
img_b_sq
)

loc_to = (corr_a[tuple(np.floor(pos).astype('int'))].copy() * 0.5 + 0.5) * img_b.shape[:2][::-1]

it looks like loc_from is the coordinates of the query point on graph a and loc_to is the coordinates of the query point on image b but isn't the prediction process of the model done in the infer_batch function def infer_batch?
out = self.infer_batch(img_batch, query_batch)

zbc-l commented

Here it seems that the pred value is overwritten by the new loop each time the loop, so the resulting pred does not seem to store all the values predicted by the loop.

for batch_idx, data_pack in tqdm.tqdm(
enumerate(self.val_loader), total=len(self.val_loader),
desc='Valid iteration=%d' % self.iteration, ncols=80,
leave=False):
loss_data, pred = self.validate_batch(data_pack)
val_loss_list.append(loss_data)
mean_loss = np.array(val_loss_list).mean()
validation_data = {'val_loss': mean_loss,
'pred': pred,
}

  1. COTR can take in query points on both image A and image B. If the X coordinate of a query point is between [0, 0.5] then it's on the left(A) image, if the X coordinate of a query point is between [0.5, 1] then it's on the right(B) image. In addition, the architecture treats each individual point independently.
  2. For the dataloader, although we input query points from both images, COTR still treats them as independent points.
  3. The inference engine is needed for the recursive zoom-in, but the base model inference is made in infer_batch.
  4. Yes, we only keep the final pred for the validation process.
zbc-l commented

Your reply is very detailed. Thank you for your patient reply.