Please clarify how to visualize the pointclouds?

Hi there,

Thank you for this excellent work, your code is very clear for something this new. Congrats to the team!

I apologize if this was a lot to read, I just want to be through with my question.

Main Question

I ran your code on scene0000_00 from the scannet dataset, and the resulting segmentation looks much worse than your results on your github repo. In my results, the segmentation is very messy and oversegmented, leading me to think I am doing something wrong.

See here, where I'm plotting the original colors, sam3d segmentation, instance_gt, semantic_gt20, semantic_gt200.

Compared to your results:

What's the difference in your results between the SAM3D and the SAM3D Merged output? How is each one produced?

I'm just trying to figure out what I might be doing wrong.

Extra details:

Detail 1

I had to comment out this line

SegmentAnything3D/sam3d.py

Line 62 in fc8eaca

color_image = cv2.resize(color_image, (640, 480))

Otherwise, the code would crash due to a dimension mismatch error between the masks and the image. Why are you resizing the image? Could this be related to my results?

Detail 2

Another thing, since I cannot download the whole scannet dataset, I'm downloading only individual scenes, and that doesn't download/create any intrinsic_depth.txt file. I am creating it manually from the scans/scene0000_00/scene0000_00.txt text file, which has content like this:

axisAlignment = 0.945519 0.325568 0.000000 -5.384390 -0.325568 0.945519 0.000000 -2.871780 0.000000 0.000000 1.000000 -0.064350 0.000000 0.000000 0.000000 1.000000 
colorHeight = 968
colorToDepthExtrinsics = 0.999973 0.006791 0.002776 -0.037886 -0.006767 0.999942 -0.008366 -0.003410 -0.002833 0.008347 0.999961 -0.021924 -0.000000 0.000000 -0.000000 1.000000
colorWidth = 1296
depthHeight = 480
depthWidth = 640
fx_color = 1170.187988
fx_depth = 571.623718
fy_color = 1170.187988
fy_depth = 571.623718
mx_color = 647.750000
mx_depth = 319.500000
my_color = 483.750000
my_depth = 239.500000
numColorFrames = 5578
numDepthFrames = 5578
numIMUmeasurements = 11834
sceneType = Apartment

Detail 3

Here is the code I am using to make my plots. I noticed you don't have any plotting code. Maybe there is a mistake in mine?

import torch
import open3d as o3d
import numpy as np

# Define the file paths
pcd_filepath = '/pcd_gt_data/train/scene0000_00.pth'  # Replace with your file path
pcd_seg_filepath = '/sam_3d_out/scene0000_00.pth'  # Replace with your file path

# Load point cloud data
pcd_data = torch.load(pcd_filepath)
seg_data = torch.load(pcd_seg_filepath)

# Get the coordinates
coordinates = pcd_data['coord'].astype('float64')  # convert to float64 for o3d

# Get the unique labels in seg_data
unique_labels = np.unique(seg_data)

# Generate random colors for each label
color_map = {label: np.random.rand(3) for label in unique_labels}

# Create an array for colors using the color_map
colors = np.array([color_map[label] for label in seg_data])

# Create a PointCloud object
pcd = o3d.geometry.PointCloud()

# Assign coordinates to the PointCloud object
pcd.points = o3d.utility.Vector3dVector(coordinates)

# Assign colors to the PointCloud object
pcd.colors = o3d.utility.Vector3dVector(colors)

# Visualize the point cloud
o3d.visualization.draw_geometries([pcd])

Thank you for your time and help in advance!

Hi, Thank you for your interest in our work！
The SAM3D output is directly generated by python sam3d.py.
The SAM3D(merged) output: Please refer here (4th in pipeline).
Detail1: The sizes of images we use are all (640, 480), whether it is when generating masks using SAM or during the process of projection into 3D. So there shouldn't be a dimension mismatch error.
Detail2: You can get the intrinsic_depth.txt file when you obtain 2D images. They are in the same file.
Detail3: You can refer here for plotting code.

Based on your description, I think the issue that may affect the results is in Detail1.
Hope this be helpful to you !