hwjiang1510/GraspTTA

Inaccurate grasp predictions

abhinavkk opened this issue · 10 comments

Hello, firstly I want to mention this is some great work!

I was using the trained model to generate grasps for my own object pointclouds generated from simulation. Surprisingly the generated hand vertices were very distant from the object. I am not very sure the reason for this, is there any requirement on the input object pointcloud origin and axes orientation before using the network that I may have missed?

I am attaching an image of the predicted grasp for one of the input object pointcloud I used:
Screenshot from 2022-04-07 19-41-42

Hi, thanks for the question.

In training, we do not apply transformation augmentation (while the ObMan dataset has a small object translation coverage in 3D). So the model will not be able to work on out of distribution object position.

This can be solved by:

  • Move the position of the object to in distribution area, generate a hand, and translate back. You can use this translation.
  • You can train a new model with transformation augmentation.

BTW, the model may not be able to generate hands for incomplete point clouds.

Thanks for your quick reply.

Will you be able to help me understand the object distribution that you are using? Or if it depends on the ObMan dataset used for training, then how can I find out the distribution that was used in the dataset.

Sorry if this is a naive question, but I am little new to this area.

Hi, thanks for the question.

In training, we do not apply transformation augmentation (while the ObMan dataset has a small object translation coverage in 3D). So the model will not be able to work on out of distribution object position.

This can be solved by:

* Move the position of the object to in distribution area, generate a hand, and translate back. You can use [this](https://github.com/hwjiang1510/GraspTTA/blob/cecb9642e6d63670d4e954cf420d03f1a93b5a90/gen_diverse_grasp_ho3d.py#L60) translation.

* You can train a new model with transformation augmentation.

BTW, the model may not be able to generate hands for incomplete point clouds.

The translation you suggested did help! Thanks for that, but I will still love to understand how did you come up with the distribution and translation for the network?

Hi, thanks for the question.
In training, we do not apply transformation augmentation (while the ObMan dataset has a small object translation coverage in 3D). So the model will not be able to work on out of distribution object position.
This can be solved by:

* Move the position of the object to in distribution area, generate a hand, and translate back. You can use [this](https://github.com/hwjiang1510/GraspTTA/blob/cecb9642e6d63670d4e954cf420d03f1a93b5a90/gen_diverse_grasp_ho3d.py#L60) translation.

* You can train a new model with transformation augmentation.

BTW, the model may not be able to generate hands for incomplete point clouds.

The translation you suggested did help! Thanks for that, but I will still love to understand how did you come up with the distribution and translation for the network?

Actually, this also happened when I want to make use of this model

Thanks for your quick reply.

Will you be able to help me understand the object distribution that you are using? Or if it depends on the ObMan dataset used for training, then how can I find out the distribution that was used in the dataset.

Sorry if this is a naive question, but I am little new to this area.

Maybe a straightforward method is checking the range and mean of the ObMan hand translation to understand the data distribution.

Thank you so much! After following as you suggested I tried using to predict for complete pointclouds and to my surprise the results were kind of unexpected and interesting (see the attached images).

  • I notice there might be a scaling that I am missing because currently the hand and object scale does not look coherent
  • The predictions are strange as quite often the hands are inside or penetrating the object. Do you think this might require retraining/tuning as I am using different dataset (YCB). If so, I will appreciate if you can give insights on re-tuning the model.

image
image
image

Yes, you should scale the input object point cloud to roughly match the size of the hand

Thank you so much for the help until now! I managed to scale down the input point cloud size to have better results. Now the results are way better, though I was curious to know what can be different methods that we can utilize to remove the contact gap for some of the hand predictions. Currently, there are few predictions which have a contact gap between the object and the hand as below:
Screenshot from 2022-06-21 12-49-48
Screenshot from 2022-06-21 12-50-27
Screenshot from 2022-06-21 12-50-01

Hi, thanks for the question.

In training, we do not apply transformation augmentation (while the ObMan dataset has a small object translation coverage in 3D). So the model will not be able to work on out of distribution object position.

This can be solved by:

  • Move the position of the object to in distribution area, generate a hand, and translate back. You can use this translation.
  • You can train a new model with transformation augmentation.

BTW, the model may not be able to generate hands for incomplete point clouds.

I want to know what the code corresponding to the link you gave in this answer means? I know it is used for translation, but I'm curious about what you mean by the initial value "np. array ([- 0.0793, 0.0208, -0.6924])"?

Hi, thanks for the question.
In training, we do not apply transformation augmentation (while the ObMan dataset has a small object translation coverage in 3D). So the model will not be able to work on out of distribution object position.
This can be solved by:

  • Move the position of the object to in distribution area, generate a hand, and translate back. You can use this translation.
  • You can train a new model with transformation augmentation.

BTW, the model may not be able to generate hands for incomplete point clouds.

I want to know what the code corresponding to the link you gave in this answer means? I know it is used for translation, but I'm curious about what you mean by the initial value "np. array ([- 0.0793, 0.0208, -0.6924])"?

It is a random value sampled from the ObMan dataset translation distribution.