med-air/DEX

How to Apply Learned Policy

Homunculus314 opened this issue · 8 comments

Hello,
I didn't have any problems running the code. However, after successfully running the training algorithms (e.g. DDPG), I don't know how to actually apply the learned policy to get the endeffector trajectory for the NeedlePick Task. By that I mean not just running the evaluation script and getting, for example, the success rate, but really retrieving output that I can use for my robot.
I apologize if this question is too basic, but I would be thankful for every hint.

Hi,

I think this is a very helpful question, especially for real-world deployment. Actually, we used to get the trajectory of the end-effector produced by trained policy, which can be demonstrated in the following figure.

image

To do so, you need to extract the 3D position of the end-effector here every time, and then save the collected position of the whole episode to form a trajectory.

How to extract 3D position from observation? This will be simple by adding one line like ee_pose = self._env._get_robot_state(idx=0). This will get the 6D pose + gripper_state of PSM1 (idx=0 here indicates PSM1). Notice that doing this will get the pose under the world coordinate, or we say PyBullet coordinate. Refer to this function for more details.

If you aim to get the pose under the robot coordinate, you may need to use some utils function here and here. For example, get link pose of end-effector, then transform it to 4x4 pose matrix if needed. That said, please tell us if you have trouble doing that, and we may discuss more details further.

Hope it helps.

Hello,

Thank you very much for the quick and useful reply.
However, I am facing a problem accessing the _get_robot_state() function. I inserted the additional line of code like this:

image

When training the network with python3 train.py task=NeedlePick-v0 agent=dex use_wb=True I get the following error:

image

Do I have to import the function in a specific way?

Yes, this is a solution.

Due to the property of a gym-based env, you can not directly call the class method starting with '_', which will be considered a private function.

To this end, you may write a function named 'get_robot_state' in the task class (e.g., in class NeedlePick or class PsmEnv) that calls 'self._get_robot_state'.

Hello,

Thank you for your help. I was able to export the trajectory and transfer it to another simulation.
However, I'm still struggling with how to use my own input, e.g., start pose of the end effector and start pose of the needle, to generate the trajectory for real-world deployment.
Is there a function I might use for that?

Hi,

One possible solution is to do close-loop control. In this way, you don't need to export trajectory from SurRoL, but export RL policy in the first place. I believe you have already done that.

Then, you need to provide RL policy with observation at each time step, like we do in the simulation. How to provide observations? Please refer to here to check the components, such as object pose, end-effector pose. You will get action (delta position) to do closed-loop position control.

Nonetheless, in this paper, we do not adopt the above way, as we have difficulty in getting accurate observations every time step. That being said, you are still encouraged to try this solution if you can.

We actually adopt another solution that takes open-loop control. We do not export policy. Instead, we record the simulated trajectory with the starting pose of the end-effector and needle. We then set up the same configuration in the real-world tasks, and replay the simulated trajectory to finish the task.

We would say this solution is only for demonstration, far from sim2real transfer. But you can try it if this satisfies your intention.

Hi,

Thank you again for your very helpful answers and please excuse the amount of questions.
I want to start by pre-generating the trajectory, like you did.
But I do not quite understand how I can just provide the start poses of the needle and the end effector to pre-generate the trajectory without running the whole evaluation script. Since the training process utilizes several subnetworks, do I only have to use one in order to apply the learned policy to generate the trajectory? Is there a way to manually set explicit start poses (e.g., x, y, z, roll, pitch, yaw)?

You are welcome.

I guess that you intend to provide start poses (from a real task) to RL in the simulation, allowing you to generate a simulated trajectory to deploy in the real task. If so, you indeed need to provide start poses.

Setting the start pose of the needle is relatively easy. You only need to manually set this line. You can write some interface by yourself, so you can finally run evaluation.py to collect desired trajectories.

Setting the start pose of the end-effector, however, is a little tricky. You need to check the code here. Maybe extra transformation is necessary here, which I am not sure about.

Hope it helps.

Closing because of inactivity.