L in extract_features.py

Dear authors,
I extract feature and save into HDF5 file but number of img_ids don't equal number of reg_feat and gri_feat.
In extract_features.py,
Why does batch_size in Dataloader equal BATCH_SIZE - 1 and why append random tensor to imgs from batch in dataloader.
Thank authors.

I have done it.

It is nice that you resolved things yourself. For more detail, let me assure as follows.

L is the total number of images in the (COCO) dataset.

Why does batch_size in Dataloader equal BATCH_SIZE - 1 and why append random tensor to imgs from batch in dataloader

Adding a random tensor of a FIXED SHAPE HxW is a small trick to ensure all the images in the dataset would be resized to the same HxW size. It is for the purpose of saving tensors of the same shape into HDF5.

images will be resized to the maximum shape HxW as in

grit/models/caption/detector.py

Line 44 in a47e362

samples = nested_tensor_from_tensor_list(samples)
More about resizing the input images can be found in

grit/utils/misc.py

Line 327 in a47e362

max_size = _max_by_axis([list(img.shape) for img in tensor_list])
Deformable Attention prefers the input batch of size 2^N (e.g., 64). Therefore, to make sure that the total number of input images (including the random tensor) = BATCH_SIZE = 64, the real images fetched from the dataloader should be batch_size = BATCH_SIZE - 1, in which 1 is for the random tensor ;)