some details on the image-frame ordering
AruniRC opened this issue · 13 comments
Hi,
can you please share the file format and structure for storing image features?
It seems your code reads in pre-computed features from an HDF5 dump for each frame that the agent can see in this line in your codebase:
Line 18 in 1cda8af
If we want to use other types of features (e.g. a different feature extractor than what you use to represent the frame image), it would be super helpful to have more details on how the feature dump is constructed, the ordering of the frame images, etc.
thank yoU!
No problem,
What you could do:
is in charge of reading the HDF5
file.
When it is initialized, it is given a link to this HDF5
file. It is initialized here:
Line 30 in 1cda8af
where images_file_name
initially comes from args.
savn/episodes/basic_episode.py
Line 126 in 1cda8af
So if you want to use different features you should add
...
--images_file_name <new-features>
...
to your run.
You should place these features in thor_offline_data/FloorPlan<n>/<new-features>
.
The images are read here
so there should be a feature associates with each str(self.state)
.
The str
method is here:
What do we do:
We run the following script:
import json
import os
import time
import warnings
from collections import deque
from math import gcd
from multiprocessing import Process, Queue
from ai2thor.controller import BFSController
from datasets.offline_controller_with_small_rotation import ExhaustiveBFSController
def search_and_save(in_queue):
while not in_queue.empty():
try:
scene_name = in_queue.get(timeout=3)
except:
return
c = None
try:
out_dir = os.path.join(<path-to-where-you-want-data-to-go>, scene_name)
if not os.path.exists(out_dir):
os.mkdir(out_dir)
print('starting:', scene_name)
c = ExhaustiveBFSController(
grid_size=0.25,
fov=90.0,
grid_file=os.path.join(out_dir, 'grid.json'),
graph_file=os.path.join(out_dir, 'graph.json'),
metadata_file=os.path.join(out_dir, 'metadata.json'),
images_file=os.path.join(out_dir, 'images.hdf5'),
depth_file=os.path.join(out_dir, 'depth.hdf5'),
grid_assumption=False)
c.start()
c.search_all_closed(scene_name)
c.stop()
except AssertionError as e:
print('Error is', e)
print('Error in scene {}'.format(scene_name))
if c is not None:
c.stop()
continue
def main():
num_processes = 30
queue = Queue()
scene_names = []
for i in range(2):
for j in range(30):
if i == 0:
scene_names.append("FloorPlan" + str(j + 1))
else:
scene_names.append("FloorPlan" + str(i + 1) + '%02d' % (j + 1))
for x in scene_names:
queue.put(x)
processes = []
for i in range(num_processes):
p = Process(target=search_and_save, args=(queue,))
p.start()
processes.append(p)
for p in processes:
p.join()
Note that AI2-THOR
(https://github.com/allenai/ai2thor) has changed since we began this project so this script will not work out of the box -- it may require some changes. I am sorry about this and am happy to help if there are issues. The scenes in Thor themselves have also changed and are much better now.
This script does BFS in a scene and saves an HDF5
file in the desired format with all of the RGB images.
e.g. you will now have FloorPlan<n>/images.hdf5
.
You can now simply iterate over these files to get the features that you want. We run a variant of the following script to get the ResNet features:
for scene in scenes:
images = h5py.File('{}/{}/images.hdf5'.format(data_dir, scene), 'r')
features = h5py.File('{}/{}/{}.hdf5'.format(data_dir, scene, method), 'w')
for k in images:
frame = resnet_input_transform(images[k][:], 224)
frame = torch.Tensor(frame)
if torch.cuda.is_available():
frame = frame.cuda()
frame = frame.unsqueeze(0)
v = model(frame)
v = v.view(512, 7, 7)
v = v.cpu().numpy()
features.create_dataset(k, data=v)
images.close()
features.close()
where resnet_input_transform
is
def resnet_input_transform(input_image, im_size):
normalize = transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
all_transforms = transforms.Compose([
transforms.ToPILImage(),
ScaleBothSides(im_size),
transforms.ToTensor(),
normalize,
])
transformed_image = all_transforms(input_image)
return transformed_image
The script will also store the depth information as depth.hdf5
in addition to the RGB images as images.hdf5
which will be helpful if you want to generate features using RGBD data.
We apologize that the process described above is somewhat tedious, we should have anticipated that people would want to do what you are doing and made this process easier.
However, re-scraping the data is not a terrible idea as we recommend using the newest AI2-THOR
in future projects. Almost a year of engineering has gone into improving Thor since we finished this project and there is a lot of new cool things you can do now (e.g. https://ai2thor.allenai.org/demo/).
Hi @mitchellnw thanks for a detailed response, will try to replicate as much as possible.
How about an easier alternative (if feasible)? If you have the images.hdf5
corresponding to your current published experiments cached somewhere, then I can simply read in the rgb images from there and extract features using some CNN model of choice. Then the image <--> feature correspondence should be maintained. Does this sound reasonable?
Yes, great idea. I will hopefully be able to locate these today. Would you want the depth information as well?
By the way, I checked out your work on unsupervised domain adaptation and it's really cool!
this is hopefully what you need: floorplans_with_images
thanks a lot for the quick response @mitchellnw !
will do some sanity checks, and let's hope it works :)
Closing this issue, please reopen if this is still an issue!
@mitchellnw -- sure. I had not worked on this much, after downloading the imagery.
Trying to come up with a minimal working scenario to quickly verify this works (sorry to bug you again with this...):
My plan is to just train and test on Kitchen scenes, using features extracted on your shared imagery (instead of using the pre-extracted features your codebase provides). If this gets similar performance, that means there is a straightforward way to use any custom feature-representation of the scene images. This would mean:
-
Extracting Resnet features on the FloorPlan"n" images of Kitchen scenes (from the link you shared:
floorplans_with_images
) -
Place these features under:
thor_offline_data/FloorPlan<n>/<new-features>.
-
Call
savn/episodes/basic_episode.py
with--images_file_name <new-features>
Does this sound ok? Also, is there a quick way to map from FloorPlann
and Scene name (i.e. which FloorPlans correspond to Kitchens, if I just want to train and test on a split of kitchen scenes?)
thank you!
Yes, sounds good, good luck! And, yes, the kitchens 1-30.
Another question -- there any distinction between FloorPlan"n" and FloorPlan"n"_physics in terms of just the images.hpy? Some of the FloorPlans have the suffix "_physics" (.e.g FloorPlan1_physics), but others do not.
You shouldn't have to worry about that, the "_physics" scenes come from a slightly newer (better) version of AI2-THOR. When we were doing the project only Kitchens and Living Rooms were completed.
By the way -- I may not recommend scraping a feature for each image. In hindsight, this may have been detrimental to performance as then you can't really do data augmentation. I would recommend running your featurizer on the fly.
Thanks for the info.
So the features I am planning to use are pretty heavy (basically run a detector on each frame, which can have a fairly large overhead to keep around in memory at runtime). If data augmentation is not done at any phase, then the relative performance would still be valid I think.... but it's a good point, i'll try to work around the memory footprint eventually.
I would like to ask if you have encountered such a situation before you run the program, and you get stuck at the following interface:
Unable to preload the following plugins:
ScreenSelector.so
Unable to preload the following plugins:
ScreenSelector.so
Loading player data from /home/ubuntu/.ai2thor/releases/thor-201903131714-Linux64/thor-201903131714-Linux64_Data/data.unity3d
Loading player data from /home/ubuntu/.ai2thor/releases/thor-201903131714-Linux64/thor-201903131714-Linux64_Data/data.unity3d