allenai/savn

Some scenes are missing resnet50_fc.hdf5 features

AruniRC opened this issue · 5 comments

Hi,

In the thor offline data folder (data/thor_offline_data or data/mixed_offline_data), 60 of the floorplans do not have resnet50_fc.hdf5 files, e.g. "FloorPlan304". However, all 120 have resnet18_featuremap.hdf5 files.

Can you please share the resnet50_fc.hdf5 features for all scenes?

Hi,

Sorry about this, the best way to do this is to get the RGB images from thor_offline_data_with_images then run the ResNet50 and take the last layer (before ReLU). Since we decided to use the second last layer of ResNet18, we did not extract the last layer ResNet50 features for all scenes. As I am transitioning roles I have access to less compute at the moment.

This code may be helpful:

def get_resnet50():
    resnet50 = models.resnet50(pretrained=True)
    modules = list(resnet50.children())[:-1]
    resnet50 = nn.Sequential(*modules)
    for p in resnet50.parameters():
        p.requires_grad = False
    return resnet50

def get_features(data_dir, scenes, method):
    model = get_resnet50()

    if torch.cuda.is_available():
        model = model.cuda()

    for scene in scenes:
        images = h5py.File('{}/{}/images.hdf5'.format(data_dir, scene), 'r')
        features = h5py.File('{}/{}/{}.hdf5'.format(data_dir, scene, method), 'w')

        for k in images:
            frame = resnet_input_transform(images[k][:], 224)
            frame = torch.Tensor(frame)
            if torch.cuda.is_available():
                frame = frame.cuda()
            frame = frame.unsqueeze(0)

            v = model(frame)
            v = v.view(2048)


            v = v.cpu().numpy()
            features.create_dataset(k, data=v)

        images.close()
        features.close()

Oh no worries, I extracted the FC-layer features and ran a baseline. It's a little worse than the Resnet18 conv-layer features that you have used in your expts.

thanks a lot for the response though!

Hi,

just putting in comparative val-set performance of using resnet-fc versus conv5 features in here, in case it helps future users:

Model SPL ≥ 1 Success ≥ 1 SPL ≥ 5 Success ≥ 5
Nonadaptivea3c (conv5 resnet18) 15.09 33.4 11.39 21.85
Nonadaptivea3c (fc7 resnet 50) 13.05 28.9 6.22 12.72

Clearly, using fc7 features gives a significant performance drop, especially in the more difficult cases (>=5).

Wow, thank you @AruniRC ! This is hugely helpful and thoughtful of you to share this!

Our motivation for using the conv5 features was that it would be helpful for the agent to have access to more spatial information. I.e. instead of just knowing that there is a microwave, the agent would know that there is a microwave to the left and not to the right.

Just a hypothesis but this may be the reason why there is a slight performance drop when using the fc7 features (on the other hand, the receptive field is pretty big by conv5).

Interesting. I think the spatial relation is definitely a factor. It's a 7x7 grid at the conv5 layer -- pretty coarse but might be good enough for navigation in simulated environments.

A second factor may be that your architecture appends the target object embedding onto the conv5 feature-map at every location. So there is a very strong conditioning of the appearance features on the target embedding, and this could help. It's pretty neat :) !