some details on the image-frame ordering

Hi,

can you please share the file format and structure for storing image features?

It seems your code reads in pre-computed features from an HDF5 dump for each frame that the agent can see in this line in your codebase:

savn/datasets/environment.py

Line 18 in 1cda8af

offline_data_dir="~/data/offline_data/",

If we want to use other types of features (e.g. a different feature extractor than what you use to represent the frame image), it would be super helpful to have more details on how the feature dump is constructed, the ordering of the frame images, etc.

thank yoU!

No problem,

What you could do:

savn/datasets/offline_controller_with_small_rotation.py

Line 520 in 1cda8af

class OfflineControllerWithSmallRotation(BaseController):

is in charge of reading the HDF5 file.

When it is initialized, it is given a link to this HDF5 file. It is initialized here:

savn/datasets/environment.py

Line 30 in 1cda8af

images_file_name=images_file_name,

where images_file_name initially comes from args.

savn/episodes/basic_episode.py

Line 126 in 1cda8af

images_file_name=args.images_file_name,

So if you want to use different features you should add

...
--images_file_name <new-features>
...

to your run.

You should place these features in thor_offline_data/FloorPlan<n>/<new-features>.

The images are read here

savn/datasets/offline_controller_with_small_rotation.py

Line 825 in 1cda8af

return self.images[str(self.state)][:]

so there should be a feature associates with each str(self.state).

The str method is here:

savn/datasets/offline_controller_with_small_rotation.py

Line 59 in 1cda8af

def __str__(self):

What do we do:

We run the following script:


import json
import os
import time
import warnings
from collections import deque
from math import gcd
from multiprocessing import Process, Queue

from ai2thor.controller import BFSController
from datasets.offline_controller_with_small_rotation import ExhaustiveBFSController


def search_and_save(in_queue):
    while not in_queue.empty():
        try:
            scene_name = in_queue.get(timeout=3)
        except:
            return
        c = None
        try:
            out_dir = os.path.join(<path-to-where-you-want-data-to-go>, scene_name)
            if not os.path.exists(out_dir):
                os.mkdir(out_dir)

            print('starting:', scene_name)
            c = ExhaustiveBFSController(
                grid_size=0.25,
                fov=90.0,
                grid_file=os.path.join(out_dir, 'grid.json'),
                graph_file=os.path.join(out_dir, 'graph.json'),
                metadata_file=os.path.join(out_dir, 'metadata.json'),
                images_file=os.path.join(out_dir, 'images.hdf5'),
                depth_file=os.path.join(out_dir, 'depth.hdf5'),
                grid_assumption=False)
            c.start()
            c.search_all_closed(scene_name)
            c.stop()
        except AssertionError as e:
            print('Error is', e)
            print('Error in scene {}'.format(scene_name))
            if c is not None:
                c.stop()
            continue


def main():

    num_processes = 30
    
    queue = Queue()
    scene_names = []
    for i in range(2):
        for j in range(30):
            if i == 0:
                scene_names.append("FloorPlan" + str(j + 1))
            else:
                scene_names.append("FloorPlan" + str(i + 1) + '%02d' % (j + 1))
    for x in scene_names:
        queue.put(x)

    processes = []
    for i in range(num_processes):
        p = Process(target=search_and_save, args=(queue,))
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

Note that AI2-THOR (https://github.com/allenai/ai2thor) has changed since we began this project so this script will not work out of the box -- it may require some changes. I am sorry about this and am happy to help if there are issues. The scenes in Thor themselves have also changed and are much better now.

This script does BFS in a scene and saves an HDF5 file in the desired format with all of the RGB images.

e.g. you will now have FloorPlan<n>/images.hdf5.

You can now simply iterate over these files to get the features that you want. We run a variant of the following script to get the ResNet features:

    for scene in scenes:
        images = h5py.File('{}/{}/images.hdf5'.format(data_dir, scene), 'r')
        features = h5py.File('{}/{}/{}.hdf5'.format(data_dir, scene, method), 'w')

        for k in images:
            frame = resnet_input_transform(images[k][:], 224)
            frame = torch.Tensor(frame)
            if torch.cuda.is_available():
                frame = frame.cuda()
            frame = frame.unsqueeze(0)

            v = model(frame)
            v = v.view(512, 7, 7)

            v = v.cpu().numpy()
            features.create_dataset(k, data=v)

        images.close()
        features.close()

where resnet_input_transform is

def resnet_input_transform(input_image, im_size):
    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    all_transforms = transforms.Compose([
        transforms.ToPILImage(),
        ScaleBothSides(im_size),
        transforms.ToTensor(),
        normalize,
    ])
    transformed_image = all_transforms(input_image)
    return transformed_image

The script will also store the depth information as depth.hdf5 in addition to the RGB images as images.hdf5 which will be helpful if you want to generate features using RGBD data.

We apologize that the process described above is somewhat tedious, we should have anticipated that people would want to do what you are doing and made this process easier.

However, re-scraping the data is not a terrible idea as we recommend using the newest AI2-THOR in future projects. Almost a year of engineering has gone into improving Thor since we finished this project and there is a lot of new cool things you can do now (e.g. https://ai2thor.allenai.org/demo/).

Hi @mitchellnw thanks for a detailed response, will try to replicate as much as possible.
How about an easier alternative (if feasible)? If you have the images.hdf5 corresponding to your current published experiments cached somewhere, then I can simply read in the rgb images from there and extract features using some CNN model of choice. Then the image <--> feature correspondence should be maintained. Does this sound reasonable?

Yes, great idea. I will hopefully be able to locate these today. Would you want the depth information as well?

By the way, I checked out your work on unsupervised domain adaptation and it's really cool!

this is hopefully what you need: floorplans_with_images

thanks a lot for the quick response @mitchellnw !
will do some sanity checks, and let's hope it works :)

Closing this issue, please reopen if this is still an issue!

@mitchellnw -- sure. I had not worked on this much, after downloading the imagery.
Trying to come up with a minimal working scenario to quickly verify this works (sorry to bug you again with this...):

My plan is to just train and test on Kitchen scenes, using features extracted on your shared imagery (instead of using the pre-extracted features your codebase provides). If this gets similar performance, that means there is a straightforward way to use any custom feature-representation of the scene images. This would mean:

Extracting Resnet features on the FloorPlan"n" images of Kitchen scenes (from the link you shared: floorplans_with_images)
Place these features under: thor_offline_data/FloorPlan<n>/<new-features>.
Call savn/episodes/basic_episode.py with --images_file_name <new-features>

Does this sound ok? Also, is there a quick way to map from FloorPlann and Scene name (i.e. which FloorPlans correspond to Kitchens, if I just want to train and test on a split of kitchen scenes?)

thank you!

Yes, sounds good, good luck! And, yes, the kitchens 1-30.

Another question -- there any distinction between FloorPlan"n" and FloorPlan"n"_physics in terms of just the images.hpy? Some of the FloorPlans have the suffix "_physics" (.e.g FloorPlan1_physics), but others do not.

You shouldn't have to worry about that, the "_physics" scenes come from a slightly newer (better) version of AI2-THOR. When we were doing the project only Kitchens and Living Rooms were completed.

By the way -- I may not recommend scraping a feature for each image. In hindsight, this may have been detrimental to performance as then you can't really do data augmentation. I would recommend running your featurizer on the fly.

Thanks for the info.

So the features I am planning to use are pretty heavy (basically run a detector on each frame, which can have a fairly large overhead to keep around in memory at runtime). If data augmentation is not done at any phase, then the relative performance would still be valid I think.... but it's a good point, i'll try to work around the memory footprint eventually.

I would like to ask if you have encountered such a situation before you run the program, and you get stuck at the following interface：

Unable to preload the following plugins:
ScreenSelector.so
Unable to preload the following plugins:
ScreenSelector.so
Loading player data from /home/ubuntu/.ai2thor/releases/thor-201903131714-Linux64/thor-201903131714-Linux64_Data/data.unity3d
Loading player data from /home/ubuntu/.ai2thor/releases/thor-201903131714-Linux64/thor-201903131714-Linux64_Data/data.unity3d