Most Grasps are in Collision with the Object
c-keil opened this issue · 5 comments
Collisions with Mesh
The above image shows the 10 highest scored grasps generated using one of the training boxes. All but one of the grasps have some overlap with the box mesh.
summary
We are attempting to compare Graspnet against our methods using simulated point clouds but are finding that the majority of generated grasps are in collision with the object mesh. In many cases we find that very highly scored grasps are in collision with the object mesh. This occurs with novel objects, as well as cylinders and boxes from the graspnet training set. We have experimented with simulating these grasps in Pybullet and found that the collisions typically result in a grasp failure.
Implementation
Our implementation details are as follows; we use pyrender to simulate a scene with the object at the origin and the camera ~0.5m away, pointed towards the object. A depth image is rendered, and converted to a point cloud in the global coordinate frame. We convert this point cloud to the camera centric frame dictated by the graspnet convention. This manipulation is done by:
#get cloud in the camera frame
grasp_cloud = np.dot(cam_frame[:3,:3].T, cloud.T).T
# get the mean of the pc in the three new axes
offsets = grasp_cloud.mean(axis=0, keepdims=True)
# shift points to center the cloud mean
grasp_cloud[:,:] -= offsets
Where cam_frame
is the pose of the camera. Pyrender convention points the z axis away from the camera field of view. We use a camera frame that is rotated 180 degrees about the x-axis from the pyrender frame. We then pass this point cloud to the grasp generator, with the exact same parameters as the default demo.main
which the readme says to use for comparison. This produces grasps that are mostly in collision with the object mesh.
We have tried varying a number of parameters related to the camera: field of view, camera distance, resolution, but did not notice any dramatic changes for reasonable values. We also tried using the exact parameters used by graspnet without any major improvements. Since this works on a real robot, the exact parameters of the camera should not matter that much.
Example
To illustrate this issue I prepared a small example, which can be found here: https://drive.google.com/drive/folders/1a8sYaobKHKan_ZZixcL4FVgae1xBwDjN?usp=sharing
To run the example, get the python and npy file from this drive link, and run them in the main graspnet folder.
Using draw_scene
with the gripper and object meshes enabled and a dummy point cloud (so that the object mesh is no inflated), we can see that all of the 5 best grasps are in collision.
This is not easy to see from the exterior, but if we look inside the object it becomes pretty obvious:
For completeness, here is the same scene with the partial point cloud:
Most of the best grasps are at least slightly in collision with the mesh, and there is no obvious way to filter them. At present, these collisons result in Graspnet achieving very low precision in our comparison. Is there some processing step that we are missing, or is the graspnet simulator set up in such a way that these collisions do not result in a grasp failure? The youtube video seems to imply that the object and gripper are initialized at the grasp pose, is that the case? When we initialize and simulate grasps that have overlap like this, the result is either unrealistic behavior, or very large contact forces that result in the object being flung away. Can you comment on this issue and indicate if we are missing something or if this behavior is normal?
couple of comments:
You should not move the camera to world coordinate frame. Just subtract the pc_mean, generate grasps, and add the pc_mean to the grasps and then apply cam_frame[:3,:3]
on each of the predicted grasps. If you apply any rotation after rendering it will make it different from the training data.
and you are using gan variation, right?
In simulation, we evaluate all the grasps and we consider any colliding grasps as failure which is not really true but we do it for the same reasons you mentioned. With the real robot, if there's slight collisions, it would nudge the object a bit and won't fail.
In the hindsight, I think the correct way of evaluating is to put the object on a table with gravity and move the gripper back in z direction by 10 15 cm and then approach the object. That way, if there's a slight collision it will push the object a bit and then grasp it. This scenario happens with the real robot, but I totally understand the tricky situations you mentioned in the simulation.
EDIT:
If you look at the plots in the paper, one of the reasons success rate drops in the plots is because we reject any colliding grasps (even slight collision).
Another easier alternative for evaluation:
- if the grasp is not in collision: simulate grasps to see if it succeeds or not.
- if it is in collision, move it back 1cm along z axis of gripper, if it's still in collision, count as failure.
- if the shifted grasp is non-colliding, simulate the shifted one.
This is a hack to allow for slight collisions and keep simulation part simple and easy.
Yes, using the gan variation.
Our point cloud generation code provides the point cloud in the world frame, so I need to convert back to the camera frame. The resulting point cloud should match the training data. I have checked this code for issues, but I will try rewriting this to directly generate the point cloud in the camera frame.
I agree that a slight collision could be a successful grasp in real life or in a more realistic simulation. Are a large number of slight collisions with the object normal?
Sounds good on the first two items.
I agree that a slight collision could be a successful grasp in real life or in a more realistic simulation. Are a large number of slight collisions with the object normal?
I've seen it for wide objects cause that would reduce the leeway for gripper.
After examining our code carefuly I was able to discover that the camera orientation matrix was not a proper rotation matrix. Vector directions were correct, but they were not all unit vectors. This caused a subtle shearing affect on the point cloud before it was sent to graspnet, which was un-sheared when grasps and the pc were converted to world coordinates, making the issue hard to spot. With this shearing issue corrected, most grasps are now collision free. Apparently pyrender is robust to orientations that are not properly normalized, or we may have found this faster.
thanks for the update. Glad that it works for you now.
Could you please close this issue?