pzzhang/VinVL

Features Dimensionality

eugeniotonanzi opened this issue · 2 comments

Hi,

I downloaded pre-trained COCO features using the command

<path/to/azcopy> copy https://biglmdiag.blob.core.windows.net/vinvl/image_features/coco_X152C4_frcnnbig2_exp168model_0060000model.roi_heads.nm_filter_2_model.roi_heads.score_thresh_0.2/model_0060000/ <local_path> --recursive

then I decoded the features.tsv file using the following code (showing only the first 10 results):

with open(features.tsv, 'r') as data_file:
  for i, line in enumerate(data_file):
    data = line.split('\t')
    id = int(data[0])
    detections = int(data[1])
    features = np.frombuffer(base64.b64decode(data[2]), np.float32).reshape((detections, -1))
    print(features.shape)
    if i == 10:
      break

What seems unusual to me is that features dimensionality obtained this way is 2054. It could be totally ok, it's just the fact that all other object detectors I've worked with have features dimensionality of some power of 2, usually 2048 or 1024. I've also check the paper and didn't find references to feature dimensionality, I was just wondering if this is correct or maybe I made some mistakes decoding features.

Thanks!

Hi,

I have the same question. When using pre-extracted features, the features size is 2054. But when I am using the provided code to extract features by myself, the dimension is 2048. Is there any reason for that?

Thanks

2054 = 2048 + x1, y1, x2, y2, w, h
microsoft/Oscar#94