Gibson Dataset glb files
Opened this issue · 10 comments
I have downloaded the gibson tiny dataset from here as gibson_tiny.tar.gz which contained some obj files and others for each scene but no glb files. I then downloaded the 3DSceneGraph and then converted it via
tools/gen_gibson_semantics.sh /path/to/3DSceneGraph_medium/automated_graph /path/to/GibsonDataset /path/to/output
which produced some scns and ids.
But I need the glb files for the scenes to run the demo notebook and I don't understand how or where to obtain those. Any directions on how to get those glb files?
you can download them in the same google form here
Download the gibson dataset (may want to start with smaller one):
Inside the form:
Gibson Dataset (trainval) for use with Habitat (11 GB):
All training and validation scenes of the Gibson dataset for use with Habitat Sim:
https://dl.fbaipublicfiles.com/habitat/data/scene_datasets/gibson_habitat_trainval.zip
Gibson Dataset for Habitat challenge (1.5 GB):
a smaller subset of 4+ quality scenes (as described in the Habitat ICCV19 paper https://arxiv.org/abs/1904.01201):
https://dl.fbaipublicfiles.com/habitat/data/scene_datasets/gibson_habitat.zip
===
Also, I currently have trouble getting the object encoder to work ( the simulator cannot get semantic sensor input ).
It keeps showing:
[21:27:07:360364]:[Warning]:[Sim] Simulator.cpp(599)::instanceStageForSceneAttributes : The active scene does not contain semantic annotations : activeSemanticSceneID_ = 0
Could you let me know if you have fixed it and get the object encoder to work please? Thank you very much!
Thanks. Once I get the demo working, I'll look into the object encoder. Were you able to make the demo notebook work directly or it needed some changes?
For some reason I am not able to get the cv2 library to work probably in jupyter notebook, so I put all the python code in a .py file and it works just fine except that object encoder is not working.
demo.py:
import os
project_dir = '/home/user1/Desktop/quadrupedal-2024/code/habitat_ai/project_ws'
os.chdir(project_dir)
import torch
import torch
import cv2, glob, imageio, joblib
from PIL import Image
from tqdm import tqdm
import quaternion as q
import matplotlib.pyplot as plt
import numpy as np
import habitat, habitat_sim
habitat_path = habitat.__path__[0]
from sklearn.metrics.pairwise import cosine_similarity
import torchvision.transforms as transforms
import torch.nn as nn
from utils.habitat_settings import default_sim_settings, make_cfg
from utils.statics import CATEGORIES, COI_INDEX
from matplotlib import animation
from IPython.display import HTML
from scipy.ndimage import rotate
from habitat.utils.visualizations import utils, fog_of_war
import imageio.v3 as iio
from torchvision.models import resnet18 as resnet18_img
from model.Graph.resnet_obj import resnet18 as resnet18_obj
from magnum import Quaternion, Vector3, Rad
def check_gpu_status():
# Check if CUDA is available
if torch.cuda.is_available():
print("CUDA is available! 🎉")
# Get the number of CUDA devices
num_devices = torch.cuda.device_count()
print(f"Number of GPU devices available: {num_devices}")
for i in range(num_devices):
print(f"Device {i}: {torch.cuda.get_device_name(i)}")
# Additional device properties can be added here
print(" - Memory Allocated:", torch.cuda.memory_allocated(i))
print(" - Memory Cached: ", torch.cuda.memory_reserved(i))
print(" - Capability: ", torch.cuda.get_device_capability(i))
print(" - Device Properties:", torch.cuda.get_device_properties(i))
else:
print("CUDA is not available. 😢")
def get_object_feat(rgb, bboxes, num_of_camera=12):
image = eval_augmentation(Image.fromarray(rgb.astype(np.uint8)))[None].cuda().float()
feat = object_encoder(image, torch.tensor(np.concatenate((np.zeros([len(bboxes), 1]), bboxes), -1)).cuda()[None].float())[0]
feat = nn.functional.normalize(feat, dim=1)
return feat
def get_image_feat(rgb, depth, num_of_camera=12):
img_tensor = (torch.tensor(rgb[None]).cuda().float() / 255.).permute(0, 3, 1, 2)
feat = img_encoder(img_tensor)
vis_embedding = nn.functional.normalize(feat, dim=1)
return vis_embedding
class ImgGraph(object):
def __init__(self):
self.feature_dim = 512
self.M = 100
def num_node(self):
return len(self.node_position_list)
def reset(self):
self.node_position_list = [] # This position list is only for visualizations
self.node_rotation_list = [] # This rotation list is only for visualizations
self.graph_memory = np.zeros([self.M, self.feature_dim])
self.A = np.zeros([self.M, self.M], dtype=np.bool_)
self.graph_mask = np.zeros(self.M)
self.graph_time = np.zeros(self.M)
self.last_localized_node_idx = np.zeros([1], dtype=np.int32)
self.last_localized_node_embedding = np.zeros([self.feature_dim], dtype=np.float32)
def initialize_graph(self, new_embeddings, positions, rotations):
self.add_node(node_idx=0, embedding=new_embeddings, time_step=0, position=positions, rotation=rotations)
self.record_localized_state(node_idx=0, embedding=new_embeddings)
def add_node(self, node_idx, embedding, time_step, position, rotation):
self.node_position_list.append(position)
self.node_rotation_list.append(rotation)
self.graph_memory[node_idx] = embedding
self.graph_mask[node_idx] = 1.0
self.graph_time[node_idx] = time_step
def record_localized_state(self, node_idx, embedding):
self.last_localized_node_idx = node_idx
self.last_localized_node_embedding = embedding
def add_edge(self, node_idx_a, node_idx_b):
self.A[node_idx_a, node_idx_b] = True
self.A[node_idx_b, node_idx_a] = True
def update_node(self, node_idx, time_info, embedding=None):
if embedding is not None:
self.graph_memory[node_idx] = embedding
self.graph_time[node_idx] = time_info
def get_positions(self, a=None):
if a is None:
return self.node_position_list
else:
return self.node_position_list[a]
def get_neighbor(self, node_idx, return_mask=False):
if return_mask:
return self.A[node_idx]
else:
return np.where(self.A[node_idx])[0]
class ObjGraph(object):
def __init__(self,):
self.feature_dim = 32
self.M = 1000
self.MV = 100
def num_node(self):
return len(self.node_position_list)
def reset(self):
self.node_position_list = [] # This position list is only for visualizations
self.graph_memory = np.zeros([self.M, self.feature_dim])
self.graph_category = np.zeros([self.M])
self.graph_score = np.zeros([self.M])
self.A_OV = np.zeros([self.M, self.MV], dtype=np.bool_)
self.graph_mask = np.zeros(self.M)
self.graph_time = np.zeros([self.M], dtype=np.int32)
self.last_localized_node_idx = 0
def initialize_graph(self, new_embeddings, object_scores, object_categories, masks, positions):
if sum(masks == 1) == 0:
masks[0] = 1
self.add_node(node_idx=0, embedding=new_embeddings, object_score=object_scores, object_category=object_categories, time_step=0, mask=masks, position=positions, vis_node_idx=0)
def add_node(self, node_idx, embedding, object_score, object_category, mask, time_step, position, vis_node_idx):
node_idx_ = node_idx
i = 0
while True:
if mask[i] == 1:
self.node_position_list.append(position[i])
self.graph_memory[node_idx_] = embedding[i]
self.graph_score[node_idx_] = object_score[i]
self.graph_category[node_idx_] = object_category[i]
self.graph_mask[node_idx_] = 1.0
self.graph_time[node_idx_] = time_step
self.add_vo_edge([node_idx_], vis_node_idx)
node_idx_ += 1
i += 1
if i == len(position):
break
def add_vo_edge(self, node_idx_obj, curr_vis_node_idx):
for node_idx_obj_i in node_idx_obj:
self.A_OV[node_idx_obj_i, curr_vis_node_idx] = True
def update_node(self, node_idx, time_info, node_score, node_category, curr_vis_node_idx, embedding=None):
if embedding is not None:
self.graph_memory[node_idx] = embedding
self.graph_score[node_idx] = node_score
self.graph_category[node_idx] = node_category
self.graph_time[node_idx] = time_info
self.A_OV[node_idx, :] = False
self.A_OV[node_idx, curr_vis_node_idx] = True
def get_positions(self, a=None):
if a is None:
return self.node_position_list
else:
return self.node_position_list[a]
def is_close(embed_a, embed_b, return_prob=False, th=0.75):
logits = np.matmul(embed_a, embed_b.transpose(1, 0))
close = (logits > th)
if return_prob:
return close, logits
else:
return close
def update_image_graph(imggraph, objgraph, new_embedding, curr_obj_embeding,
object_score, object_category, position, rotation, time, done, img_node_th=0.8, obj_node_th=0.8):
# The position is only used for visualizations
if done:
imggraph.reset()
imggraph.initialize_graph(new_embedding, position, rotation)
obj_close = True
obj_graph_mask = objgraph.graph_score[objgraph.A_OV[:, imggraph.last_localized_node_idx]] > 0.5
if len(obj_graph_mask) > 0:
curr_obj_mask = object_score > 0.5
if np.sum(curr_obj_mask) / len(curr_obj_mask) >= 0.5:
close_obj, prob_obj = is_close(objgraph.graph_memory[objgraph.A_OV[:, imggraph.last_localized_node_idx]], curr_obj_embeding, return_prob=True, th=obj_node_th)
close_obj = close_obj[obj_graph_mask, :][:, curr_obj_mask]
category_mask = objgraph.graph_category[objgraph.A_OV[:, imggraph.last_localized_node_idx]][obj_graph_mask][:, None] == object_category[curr_obj_mask]
close_obj[~category_mask] = False
if len(close_obj) >= 3:
clos_obj_p = close_obj.any(1).sum() / (close_obj.shape[0])
if clos_obj_p < 0.1: # Fail to localize (find the same object) with the last localized frame
obj_close = False
close, prob = is_close(imggraph.last_localized_node_embedding[None], new_embedding[None], return_prob=True, th=img_node_th)
found = (np.array(done) + close.squeeze()) & np.array(obj_close).squeeze()
found_prev = False
found = found
found_in_memory = False
to_add = False
if found:
imggraph.update_node(imggraph.last_localized_node_idx, time)
found_prev = True
else:
check_list = 1 - imggraph.graph_mask[:imggraph.num_node()]
check_list[imggraph.last_localized_node_idx] = 1.0
while not found:
not_checked_yet = np.where((1 - check_list))[0]
neighbor_embedding = imggraph.graph_memory[not_checked_yet]
num_to_check = len(not_checked_yet)
if num_to_check == 0:
to_add = True
break
else:
close, prob = is_close(new_embedding[None], neighbor_embedding, return_prob=True, th=img_node_th)
close = close[0];
prob = prob[0]
close_idx = np.where(close)[0]
if len(close_idx) >= 1:
found_node = not_checked_yet[prob.argmax()]
else:
found_node = None
if found_node is not None:
found = True
if abs(time - imggraph.graph_time[found_node]) > 20:
found_in_memory = True
imggraph.update_node(found_node, time, new_embedding)
imggraph.add_edge(found_node, imggraph.last_localized_node_idx)
imggraph.record_localized_state(found_node, new_embedding)
check_list[found_node] = 1.0
if to_add:
new_node_idx = imggraph.num_node()
imggraph.add_node(new_node_idx, new_embedding, time, position, rotation)
imggraph.add_edge(new_node_idx, imggraph.last_localized_node_idx)
imggraph.record_localized_state(new_node_idx, new_embedding)
last_localized_node_idx = imggraph.last_localized_node_idx
return imggraph
def update_object_graph(imggraph, objgraph, object_embedding, object_score, object_category, object_mask, object_position, time, done, obj_node_th=0.8):
# The position is only used for visualizations. Remove if object features are similar
# object masking
object_score = object_score[object_mask == 1]
object_category = object_category[object_mask == 1]
object_position = object_position[object_mask == 1]
object_embedding = object_embedding[object_mask == 1]
object_mask = object_mask[object_mask == 1]
if done:
objgraph.reset()
objgraph.initialize_graph(object_embedding, object_score, object_category, object_mask, object_position)
to_add = [True] * int(sum(object_mask))
not_found = not done # Dense connection mode
if not_found:
hop1_vis_node = imggraph.A[imggraph.last_localized_node_idx]
hop1_obj_node_mask = np.sum(objgraph.A_OV.transpose(1, 0)[hop1_vis_node], 0) > 0
curr_obj_node_mask = objgraph.A_OV[:, imggraph.last_localized_node_idx]
neighbor_obj_node_mask = (hop1_obj_node_mask + curr_obj_node_mask) > 0
neighbor_node_embedding = objgraph.graph_memory[neighbor_obj_node_mask]
neighbor_obj_memory_idx = np.where(neighbor_obj_node_mask)[0]
neighbor_obj_memory_score = objgraph.graph_score[neighbor_obj_memory_idx]
neighbor_obj_memory_cat = objgraph.graph_category[neighbor_obj_memory_idx]
close, prob = is_close(neighbor_node_embedding, object_embedding, return_prob=True, th=obj_node_th)
for c_i in range(prob.shape[1]):
close_mem_indices = np.where(close[:, c_i] == 1)[0]
for m_i in close_mem_indices:
is_same = False
to_update = False
if (object_category[c_i] == neighbor_obj_memory_cat[m_i]) and object_category[c_i] != -1:
is_same = True
if object_score[c_i] > neighbor_obj_memory_score[m_i]:
to_update = True
if is_same:
to_add[c_i] = False
if to_update:
objgraph.update_node(m_i, time, object_score[c_i], object_category[c_i], int(imggraph.last_localized_node_idx), object_embedding[c_i])
break
# Add new objects to graph
if sum(to_add) > 0:
start_node_idx = objgraph.num_node()
new_idx = np.where(np.stack(to_add))[0]
objgraph.add_node(start_node_idx, object_embedding[new_idx], object_score[new_idx],
object_category[new_idx], object_mask[new_idx], time,
object_position[new_idx], int(imggraph.last_localized_node_idx))
return objgraph
def draw_graph_on_map(topdownmap, imggraph, objgraph, draw_im_graph=True, draw_obj_graph=True, node_size_px=60):
node_list = imggraph.node_position_list
affinity = imggraph.A
last_localized_imnode = imggraph.last_localized_node_idx
print(last_localized_imnode)
obj_node_list = objgraph.node_position_list
ov_affinity = objgraph.A_OV
if draw_obj_graph:
h, w, _ = topdownmap.shape
draw_obj_point_list = []
for idx, node_position in enumerate(obj_node_list):
connected_imnode = np.where(ov_affinity[idx])[0]
draw_obj_point_list.append(node_position)
if draw_im_graph:
neighbors = np.where(ov_affinity[idx])[0]
for neighbor_idx in neighbors:
neighbor_position = node_list[neighbor_idx]
node_map_pose = to_grid(node_position[0], node_position[2])
neighbor_map_pose = to_grid(neighbor_position[0], neighbor_position[2])
line = cv2.line(
topdownmap,
node_map_pose,
neighbor_map_pose,
[253, 173, 211],
thickness=10
)
alpha = 0.8
topdownmap = cv2.addWeighted(line, alpha, topdownmap, 1 - alpha, 0)
if draw_im_graph:
draw_im_point_list = []
for idx, node_position in enumerate(node_list):
if idx == last_localized_imnode:
draw_im_point_list.append([node_position, [51, 153, 102]])
else:
draw_im_point_list.append([node_position, [15, 119, 143]])
neighbors = np.where(affinity[idx])[0]
for neighbor_idx in neighbors:
neighbor_position = node_list[neighbor_idx]
node_map_pose = to_grid(node_position[0], node_position[2])
neighbor_map_pose = to_grid(neighbor_position[0], neighbor_position[2])
cv2.line(
topdownmap,
node_map_pose,
neighbor_map_pose,
[177, 232, 246],
thickness=30,
)
if draw_im_graph:
for node_position, node_color in draw_im_point_list:
graph_node_center = to_grid(node_position[0], node_position[2])
cv2.circle(topdownmap, graph_node_center, node_size_px // 3, node_color, -1)
if draw_obj_graph:
cnt = 0
for node_position in draw_obj_point_list:
graph_node_center = to_grid(node_position[0], node_position[2])
triangle_cnt = np.array(graph_node_center + np.array([pt1, pt2, pt3]).astype(np.int32))
cv2.drawContours(topdownmap, [triangle_cnt], 0, [248, 106, 176], -1)
cnt += 1
return topdownmap
def get_objects(semantic, mapping, depth, img_width, img_height):
semantic = semantic.astype(np.int32)
max_key = np.max(np.array(list(mapping.keys())))
replace_values = []
for i in np.arange(max_key + 1):
try:
replace_values.append(mapping[i])
except:
replace_values.append(-1)
semantic_obs_class = np.take(replace_values, semantic)
COI_MASK = [(semantic_obs_class == ci).astype(np.int32) for ci in COI_INDEX[dataset.split("_")[0]]] # class mask
unique_instances = np.unique(semantic * np.sum(np.stack(COI_MASK), 0))[1:]
bbox_ids = unique_instances
instance_segment = [(semantic == i).astype(np.int32) for i in unique_instances]
box_categories = [np.unique(semantic_obs_class[semantic == i])[0] for i in unique_instances]
if len(instance_segment) > 0:
object_size = np.stack([np.sum(instance_segman) for instance_segman in instance_segment])
mask = (object_size > 100)
instance_segment = [instance_segman for i, instance_segman in enumerate(instance_segment) if mask[i] == 1]
box_categories = np.stack(box_categories)[mask == 1]
bbox_ids = np.array(bbox_ids)[mask == 1]
x1s = [np.min(np.where(instance_segment[i])[1]) for i in range(len(instance_segment))]
y1s = [np.min(np.where(instance_segment[i])[0]) for i in range(len(instance_segment))]
x2s = [np.max(np.where(instance_segment[i])[1]) for i in range(len(instance_segment))]
y2s = [np.max(np.where(instance_segment[i])[0]) for i in range(len(instance_segment))]
bboxes = np.stack((x1s, y1s, x2s, y2s), 1)
if len(bboxes) > 0:
edge_box_idx = np.where(bboxes[:, 2] - bboxes[:, 0] > img_width * 0.8)[0]
not_edge_box_idx = np.where(bboxes[:, 2] - bboxes[:, 0] <= img_width * 0.8)[0]
if len(edge_box_idx) > 0:
x1s1 = [np.min(np.where(instance_segment[i][:, :int(instance_segment[i].shape[1] / 2)])[1]) for i in edge_box_idx]
y1s1 = [np.min(np.where(instance_segment[i][:, :int(instance_segment[i].shape[1] / 2)])[0]) for i in edge_box_idx]
x2s1 = [np.max(np.where(instance_segment[i][:, :int(instance_segment[i].shape[1] / 2)])[1]) for i in edge_box_idx]
y2s1 = [np.max(np.where(instance_segment[i][:, :int(instance_segment[i].shape[1] / 2)])[0]) for i in edge_box_idx]
bboxes_1 = np.stack((x1s1, y1s1, x2s1, y2s1), 1)
bboxes_1_categories = box_categories[edge_box_idx]
bboxes_1_ids = bbox_ids[edge_box_idx]
x1s2 = [int(instance_segment[i].shape[1] / 2) + np.min(np.where(instance_segment[i][:, int(instance_segment[i].shape[1] / 2):])[1]) for i in edge_box_idx]
y1s2 = [np.min(np.where(instance_segment[i][:, int(instance_segment[i].shape[1] / 2):])[0]) for i in edge_box_idx]
x2s2 = [int(instance_segment[i].shape[1] / 2) + np.max(np.where(instance_segment[i][:, int(instance_segment[i].shape[1] / 2):])[1]) for i in edge_box_idx]
y2s2 = [np.max(np.where(instance_segment[i][:, int(instance_segment[i].shape[1] / 2):])[0]) for i in edge_box_idx]
bboxes_2 = np.stack((x1s2, y1s2, x2s2, y2s2), 1)
bboxes_2_categories = box_categories[edge_box_idx]
bboxes_2_ids = bbox_ids[edge_box_idx]
bboxes_ = bboxes[not_edge_box_idx]
box_categories_ = box_categories[not_edge_box_idx]
bbox_ids_ = bbox_ids[not_edge_box_idx]
bboxes = np.concatenate((bboxes_, bboxes_1, bboxes_2), 0)
box_categories = np.concatenate((box_categories_, bboxes_1_categories, bboxes_2_categories), 0)
bbox_ids = np.concatenate((bbox_ids_, bboxes_1_ids, bboxes_2_ids), 0)
if len(depth) > 0 and len(bboxes) > 0:
box_pix_xs, box_pix_ys = ((bboxes[:, 0])).astype(np.int32), ((bboxes[:, 1])).astype(np.int32)
box_depth = np.stack([depth[box_pix_y, box_pix_x] for box_pix_x, box_pix_y in zip(box_pix_xs, box_pix_ys)])
box_mask = box_depth < 5
if np.sum(box_mask) == 0:
return [], [], [], [], []
bboxes = bboxes[box_mask]
box_categories = box_categories[box_mask]
bbox_ids = bbox_ids[box_mask]
box_depth = box_depth[box_mask]
box_world = np.stack([object_loc[bbox_id] for bbox_id in bbox_ids])
bboxes = bboxes.astype(np.float32)
bboxes[:, 0] = bboxes[:, 0].astype(np.float32) / float(img_width)
bboxes[:, 1] = bboxes[:, 1].astype(np.float32) / float(img_height)
bboxes[:, 2] = bboxes[:, 2].astype(np.float32) / float(img_width)
bboxes[:, 3] = bboxes[:, 3].astype(np.float32) / float(img_height)
return bboxes, box_categories, bbox_ids, box_depth, box_world
return [], [], [], [], []
def to_grid(realworld_x: float,realworld_y: float,):
A = [realworld_x-(worldWidth+2*world_min_width)/2, realworld_y-(worldHeight+2*world_min_height)/2, 1, 1]
grid_x, grid_y = np.array([imgWidth/2, imgHeight/2]) * np.matmul(P, A)[:2] + np.array([imgWidth/2, imgHeight/2])
return int(grid_x), int(grid_y)
def from_grid(grid_x: int,grid_y: int):
realworld_x, realworld_y = np.matmul(np.linalg.inv(P), [(2 * grid_x - imgWidth)/imgWidth, (2 * grid_y - imgHeight)/imgHeight, 1, 1])[:2] \
+ np.array([(worldWidth+2*world_min_width)/2, (worldHeight+2*world_min_height)/2])
return realworld_x, realworld_y
from habitat.utils.geometry_utils import (
quaternion_from_coeff,
quaternion_rotate_vector,
)
from habitat.tasks.utils import cartesian_to_polar
def get_map_angle(ref_rotation):
heading_vector = quaternion_rotate_vector(
ref_rotation.inverse(), np.array([0, 0, -1])
)
phi = cartesian_to_polar(-heading_vector[2], heading_vector[0])[1]
return np.array(phi)
def draw_bbox(rgb: np.ndarray, bboxes: np.ndarray, color = (178,193,118), is_detection=False) -> np.ndarray:
for i, bbox in enumerate(bboxes):
imgHeight, imgWidth, _ = rgb.shape
cv2.rectangle(rgb, (int(bbox[0]*imgWidth), int(bbox[1]*imgHeight)), (int(bbox[2]*imgWidth), int(bbox[3]*imgHeight)), (255, 255, 0), 5)
return rgb
def step(in_action):
action_names = list(cfg.agents[settings["default_agent"]].action_space.keys())
action = action_names[in_action]
sim.step(action)
def draw_agent(topdownmap, agent_position, agent_rotation, jackal_radius_px=50):
rotated_jackal = rotate(
JACKAL_SPRITE, agent_rotation * 180 / np.pi + 90
)
initial_jackal_size = JACKAL_SPRITE.shape[0]
new_size = rotated_jackal.shape[0]
jackal_size_px = max(1, int(jackal_radius_px * 2 * new_size / initial_agent_size))
resized_jackal = cv2.resize(
rotated_jackal,
(jackal_size_px, jackal_size_px),
interpolation=cv2.INTER_LINEAR
)
topdownmap = utils.paste_overlapping_image(topdownmap, resized_jackal, agent_position)
return topdownmap
def concat_vertical(img1, img2):
img1_shape = img1.shape
img2_shape = img2.shape
if img1_shape[1] > img2_shape[1]:
img2 = cv2.resize(img2, (img1_shape[1], int(img2_shape[0] / img2_shape[1] * img1_shape[1])))
elif img1_shape[1] < img2_shape[1]:
img1 = cv2.resize(img1, (img2_shape[1], int(img1_shape[0] / img1_shape[1] * img2_shape[1])))
return np.concatenate((img1, img2), axis=0)
def init():
im.set_data(topdownmaps[0])
def animate(i):
im.set_data(topdownmaps[i])
return im
if __name__ == "__main__":
check_gpu_status()
"""
Set the display port!!
"""
# os.environ['DISPLAY'] = 'localhost:10.0'
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
eval_augmentation = transforms.Compose([
transforms.ToTensor(),
normalize
])
img_encoder = resnet18_img(num_classes=512)
dim_mlp = img_encoder.fc.weight.shape[1]
img_encoder.fc = nn.Sequential(nn.Linear(dim_mlp, dim_mlp), nn.ReLU(), img_encoder.fc)
ckpt_pth = os.path.join(project_dir, 'data/graph', 'Img_encoder.pth.tar')
ckpt = torch.load(ckpt_pth, map_location='cpu')
state_dict = {k[len('module.encoder_q.'):]: v for k, v in ckpt['state_dict'].items() if 'module.encoder_q.' in k}
img_encoder.load_state_dict(state_dict)
img_encoder.eval().cuda()
feature_dim = 32
object_encoder = resnet18_obj(num_classes=feature_dim)
dim_mlp = object_encoder.fc.weight.shape[1]
object_encoder.fc = nn.Sequential(nn.Linear(dim_mlp, dim_mlp), nn.ReLU(), object_encoder.fc)
ckpt_pth = os.path.join(project_dir, 'data/graph', 'Obj_encoder.pth.tar')
ckpt = torch.load(ckpt_pth, map_location='cpu')
state_dict = {k[len('module.encoder_k.'):]: v for k, v in ckpt['state_dict'].items() if 'module.encoder_k.' in k}
object_encoder.load_state_dict(state_dict)
object_encoder.eval().cuda()
settings = default_sim_settings.copy()
settings["max_frames"] = 100
settings["width"] = 256
settings["height"] = 256
settings["scene"] = "/home/user1/Desktop/quadrupedal-2024/code/habitat_ai/habitat-lab/habitat-lab/data/scene_datasets/gibson/Albertville.glb"
settings["save_png"] = False # args.save_png
settings["sensor_height"] = 0.88
settings["color_sensor"] = True
settings["semantic_sensor"] = True
settings["depth_sensor"] = True
settings["print_semantic_scene"] = False
settings["print_semantic_mask_stats"] = False
settings["compute_shortest_path"] = False
settings["compute_action_shortest_path"] = False
settings["panoramic_sensor"] = True
settings["seed"] = 2343
settings["silent"] = False
settings["enable_physics"] = False
settings["draw_lidar"] = False
settings["agent_radius"] = 0.1
settings["agent_height"] = 1.2
settings["multiview"] = False
settings["hfov"] = 90
settings["FORWARD_STEP_SIZE"] = 0.25
settings["TURN_ANGLE"] = 10
settings["tdv_height"] = 512
settings["tdv_width"] = 512
settings["allow_sliding"] = True
num_of_camera = 12
pano_img_height = settings["height"]//2
img_height = pano_img_height
img_width = float(img_height * 4)
cam_width = float(pano_img_height * 4 // num_of_camera)
settings["cam_width"] = int(cam_width)
settings["pano_height"] = int(pano_img_height)
dataset = 'gibson'
scan_name = "Andover"
if dataset == "mp3d":
settings["scene"] = os.path.join(habitat_path, '../data/scene_datasets/{}/{}/{}.glb'.format(dataset, scan_name, scan_name))
elif "gibson" in dataset:
settings["scene"] = os.path.join(habitat_path, '../data/scene_datasets/{}/{}.glb'.format(dataset, scan_name))
elif dataset == "hm3d":
path = glob.glob(os.path.join(habitat_path, '../data/scene_datasets/{}/*/{}/{}.glb'.format(dataset, "*" + scan_name, scan_name)))[0]
settings["scene"] = path
cfg = make_cfg(settings)
try:
sim.close()
except:
pass
sim = habitat_sim.Simulator(cfg)
max_object = 10
bounds = sim.pathfinder.get_bounds()
scene_objects = sim.semantic_scene.objects
if dataset == "mp3d":
mapping = {int(obj.id.split("_")[-1]): obj.category.index() for obj in scene_objects if obj != None}
elif dataset == "hm3d":
mapping = {int(obj.id.split("_")[-1]): obj.category.index() for obj in scene_objects if obj != None}
else:
mapping = {int(obj.id.split("_")[-1]): int(np.where([obj.category.name() == cat for cat in CATEGORIES['gibson']])[0][0]) for obj in scene_objects if obj != None}
set_floor = 0
floor = -1
num_try = 0
render_configs = joblib.load(os.path.join(project_dir, f"data/floorplans/{dataset}_floorplans/render_config.pkl"))
while floor != set_floor:
init_position = sim.pathfinder.get_random_navigable_point()
floor = int(np.argmin([abs(float(render_configs[scan_name][i]['z_low']) - init_position[1]) for i in render_configs[scan_name].keys()]))
num_try += 1
if num_try > 100:
print("Cannot find a valid position")
break
init_rotation = sim.get_agent(0).get_state().rotation.components
scene_file = settings["scene"]
sim.reset()
agent = sim.initialize_agent(0)
start_state = agent.get_state()
if (start_state.position != init_position).any():
start_state.position = init_position
start_state.rotation = q.from_float_array(init_rotation) # self.init_rotation #
start_state.sensor_states = dict() ## Initialize sensori
agent.set_state(start_state)
prev_position = agent.get_state().position
prev_rotation = q.as_euler_angles(agent.state.rotation)[1]
P = render_configs[scan_name][floor]['Projection']
lower_bound, upper_bound = sim.pathfinder.get_bounds()
imgWidth = round(float(render_configs[scan_name][floor]['width']))
imgHeight = round(float(render_configs[scan_name][floor]['height']))
world_min_width = float(render_configs[scan_name][floor]['x_low'])
world_max_width = float(render_configs[scan_name][floor]['x_high'])
world_min_height = float(render_configs[scan_name][floor]['y_low'])
world_max_height = float(render_configs[scan_name][floor]['y_high'])
worldWidth = abs(world_min_width) + abs(world_max_width)
worldHeight = abs(world_min_height) + abs(world_max_height)
print('====================================')
print('scan_name: ', scan_name)
print('floor: ', floor)
print('lower_bound: ', lower_bound)
print('upper_bound: ', upper_bound)
print('imgWidth: ', imgWidth)
print('imgHeight: ', imgHeight)
ortho_map = cv2.imread(os.path.join(project_dir, f"data/floorplans/{dataset}_floorplans/rgb/{scan_name}_level_{floor}.png"))[...,::-1][...,:3]
ortho_mask = cv2.imread(os.path.join(project_dir, f"data/floorplans/{dataset}_floorplans/mask/{scan_name}_level_{floor}.png"), 0)
ortho_map[ortho_mask==0] = 255
scene_objects = sim.semantic_scene.objects
object_loc = {int(obj.id.split("_")[-1]): obj.aabb.center for obj in scene_objects if obj != None}
size= 0.4
pt1 = np.array([150-150, 100-150])*size
pt2 = np.array([100-150, 200-150])*size
pt3 = np.array([200-150, 200-150])*size
pt1 = pt1.astype(np.int32)
pt2 = pt2.astype(np.int32)
pt2 = pt2.astype(np.int32)
JACKAL_SPRITE = iio.imread(os.path.join(project_dir, "data/assets/maps_topdown_agent_sprite/jackal.png"))
JACKAL_SPRITE = np.ascontiguousarray(np.flipud(JACKAL_SPRITE))
initial_agent_size = JACKAL_SPRITE.shape[0]
action_names = list(cfg.agents[settings["default_agent"]].action_space.keys())
print(action_names)
start_position = Vector3([from_grid(356, 535)[0], init_position[1], from_grid(356, 535)[1]])
goal_position = Vector3([from_grid(187, 124)[0], init_position[1], from_grid(187, 124)[1]])
agent = sim.initialize_agent(0)
start_state = agent.get_state()
start_state.position = start_position
agent.set_state(start_state)
imggraph = ImgGraph()
objgraph = ObjGraph()
positions = []
rotations = []
topdownmap = ortho_map.copy()
obs = sim.get_sensor_observations()
rgb = np.concatenate([obs['panoramic_rgb_part_sensor_%d' % (i)][:, :, :3] for i in range(num_of_camera)], 1)
semantic_array = np.concatenate([obs['panoramic_semantic_part_sensor_%d' % (i)] for i in range(num_of_camera)], 1)
depth_array = np.concatenate([obs['panoramic_depth_part_sensor_%d' % (i)] for i in range(num_of_camera)], 1)
bboxes, bbox_categories, bbox_ids, bbox_depth, bbox_pose = get_objects(semantic_array, mapping, depth_array, rgb.shape[1], rgb.shape[0])
position = sim.get_agent(0).get_state().position
rotation = sim.get_agent(0).get_state().rotation.components
cur_map_coord = to_grid(position[0], position[2])[::-1]
cur_map_rot = get_map_angle(sim.get_agent(0).get_state().rotation)
image_feat = get_image_feat(rgb, depth_array).cpu().detach().numpy()
object_feat = []
if len(bboxes) > 0:
object_feat = get_object_feat(rgb, bboxes).cpu().detach().numpy()
position = sim.get_agent(0).get_state().position
rotation = sim.get_agent(0).get_state().rotation.components
imggraph.reset()
imggraph.initialize_graph(image_feat[0], position, rotation)
objgraph.reset()
if len(bboxes) > 0:
objgraph.initialize_graph(object_feat, np.ones(len(bboxes)), bbox_categories,np.ones(len(bboxes)), bbox_pose)
topdownmap = draw_agent(topdownmap, cur_map_coord, cur_map_rot)
cv2.imshow(f'vis', rgb[:, :, [2, 1, 0]])
cv2.imshow(f'graph', topdownmap[::2, ::2, [2, 1, 0]])
topdownmaps = []
images = []
t = 0
while True:
topdownmap = ortho_map.copy()
key = cv2.waitKey(0)
if key == ord('w'):
step(1)
elif key == ord('a'):
step(2)
elif key == ord('d'):
step(3)
elif key == ord('q'):
cv2.destroyAllWindows()
plt.imshow(topdownmap[::2, ::2, [2, 1, 0]])
break
position = sim.get_agent(0).get_state().position
rotation = sim.get_agent(0).get_state().rotation.components
cur_map_coord = to_grid(position[0], position[2])[::-1]
cur_map_rot = get_map_angle(sim.get_agent(0).get_state().rotation)
positions.append(position)
rotations.append(rotation)
obs = sim.get_sensor_observations()
rgb = np.concatenate([obs['panoramic_rgb_part_sensor_%d' % (i)][:, :, :3] for i in range(num_of_camera)], 1)
semantic_array = np.concatenate([obs['panoramic_semantic_part_sensor_%d' % (i)] for i in range(num_of_camera)], 1)
depth_array = np.concatenate([obs['panoramic_depth_part_sensor_%d' % (i)] for i in range(num_of_camera)], 1) / 10.
bboxes, bbox_categories, bbox_ids, bbox_depth, bbox_pose = get_objects(semantic_array, mapping, depth_array, rgb.shape[1], rgb.shape[0])
image_feat = get_image_feat(rgb, depth_array).cpu().detach().numpy()
object_feat = []
if len(bboxes) > 0:
object_feat = get_object_feat(rgb, bboxes).cpu().detach().numpy()
imggraph = update_image_graph(imggraph, objgraph, image_feat[0], object_feat, np.ones(len(bboxes)), bbox_categories,
position, rotation, time=t, done=False)
if len(bboxes) > 0:
objgraph = update_object_graph(imggraph, objgraph, object_feat, np.ones(len(bboxes)), bbox_categories, np.ones(len(bboxes)),
bbox_pose, time=t, done=False)
topdownmap = draw_graph_on_map(topdownmap, imggraph, objgraph)
topdownmap = draw_agent(topdownmap, cur_map_coord, cur_map_rot)
rgb = draw_bbox(rgb, bboxes)
topdownmaps.append(concat_vertical(rgb, topdownmap))
cv2.imshow(f'vis', rgb[:, :, [2, 1, 0]])
cv2.imshow(f'graph', topdownmap[::2, ::2, [2, 1, 0]])
t = t + 1
topdownmaps = np.stack(topdownmaps)
fig = plt.figure(figsize=(10,10))
plt.axis('off')
plt.gca().set_axis_off()
plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, hspace = 0, wspace = 0)
plt.margins(0,0)
im = plt.imshow(topdownmaps[0])
plt.close() # this is required to not display the generated image
anim = animation.FuncAnimation(fig, animate, init_func=init, frames=topdownmaps.shape[0], interval=50)
HTML(anim.to_html5_video())
writergif = animation.PillowWriter(fps=20)
anim.save(os.path.join(project_dir, "demo/tsgm_demo.gif"), writer=writergif)
My kernel in the notebook crashes at
sim = habitat_sim.Simulator(cfg)
I will check if your code runs through.
My kernel in the notebook crashes at
sim = habitat_sim.Simulator(cfg)
I will check if your code runs through.
Btw, my kernel also kept crashing last night, It took my a long while to figure out that it's my cv2 package causing the issue, and then I switched to python file.
Thank you for the py file, running that indeed solves the kernel crashing issue. Now it successfully loads the scene but produces an error while trying to get the objects.
Traceback (most recent call last): File "demo.py", line 670, in <module> bboxes, bbox_categories, bbox_ids, bbox_depth, bbox_pose = get_objects(semantic_array, mapping, depth_array, rgb.shape[1], rgb.shape[0]) File "demo.py", line 365, in get_objects max_key = np.max(np.array(list(mapping.keys()))) File "<__array_function__ internals>", line 200, in amax File "/home/tayyab/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2820, in amax return _wrapreduction(a, np.maximum, 'max', axis, None, out, File "/home/tayyab/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity
Did you get a similar error?
The reason for this error seems to be the empty mapping dict, which is a result of scene_objects = sim.semantic_scene.objects
being empty. So, the simulator is unable to find any semantics in the scene. I have tried changing up the scene, but the same issue persists.
@Kenn3o3 I have managed to run the demo notebook and the image encoder to work for the Allensville scan. I had to convert the original dataset via the 3DSceneGraph dataset as described here.
@Kenn3o3 I have managed to run the demo notebook and the image encoder to work for the Allensville scan. I had to convert the original dataset via the 3DSceneGraph dataset as described here.
Thanks!
@Kenn3o3我已经成功运行演示笔记本和图像编码器,以便对 Allensville 进行扫描。我必须按照此处所述通过3DSceneGraph数据集转换原始数据集。
Can you take a look at the converted data directory structure and how the generated content is placed in the scene_data directory after I run the installation instructions