ucsdarclab/dVRL

Question about result for Pick Environment

Opened this issue · 11 comments

Hello, I have some questions about the results about the environment of pick_and_place.
I used ddpg+her to train the agent, but get bad result(success rate=0), I read your paper, you said you use BC(behavior cloning), could you give some hint or some reference to get a good result?

Hello! For the pick environment, the state space is quite large due to the small scale of the PSM gripper and the object. I generated demonstrations of grasping by driving the tool above the object, grasping it, and then moving it to the goal. These demonstrations are then used by augmenting the policy loss in DDPG with a behavioral cloning loss. This is similar to https://arxiv.org/pdf/1709.10089.pdf

Hello! Thanks for the reply, I read the paper you mentioned. I wonder how you generate the demonstrations data, just guide the arm in vrep and collect the data?
I read in the paper that he used a VR device(HTC ViVE)

The grasping task can be put into 3 steps:

  1. Move the arm above the object and orientate the gripper so it is angled towards the object
  2. Open the gripper, move it directly over the object, and close gripper
  3. Move the object to the target location.

Hope this helps!

I mean...from baseline#474, he generates the data by a script. Can I also get the data similar to this way?

Sorry for the delayed response. I had to look around. I found the generated data and have attached it in the .zip file. Also the code I wrote is posted below:

    actions = []
    observations = []
    infos = []

    for it in range(0,numEp_forData):
    
        episodeActs = []
        episodeObs  = []
        episodeInfo = []
    
        state = env_dvrk.reset()
        episodeObs.append(state)
        env_dvrk.render()
    

        step  = 0

        for i in range(0,13):
            a = [0,0, -1, 1]

            state,r, _,info = env_dvrk.step(a)
            step += 1

            episodeActs.append(a)
            episodeObs.append(state)
            episodeInfo.append(info)

        for i in range(0,2):
            a = [0,0, 0, -0.5]
            state,r, _,info =  env_dvrk.step(a)
            step += 1

            episodeActs.append(a)
            episodeObs.append(state)
            episodeInfo.append(info)
        

        while step < env_dvrk._max_episode_steps:
            goal     = state['desired_goal']
            pos_ee   = state['observation'][-3:]
            pos_obj  = state['observation'][-4:]
            action = np.array(goal - pos_ee)

            a = np.clip([10*action[0], 10*action[1], 10*action[2], -0.5], -1, 1)
            state,r, _,info =  env_dvrk.step(a)
            step += 1

            episodeActs.append(a)
            episodeObs.append(state)
            episodeInfo.append(info)
    
        actions.append(episodeActs)
        observations.append(episodeObs)
        infos.append(episodeInfo)
    
        print('Final Reward at {} is {}'.format(it,r))


dvrkPick.zip

Thanks for your help;
Another question, did you meet with the problem that when the gripper grasp the object,the PSM shakes violently?(When run the script)

During training, I’ve only seen that when trying other rewards functions.

Your discussions above helped me a lot. Thank you!
I used OpenAI/baselines/her to train an agent in dVRLPick environment with demos provided by @bango123 , but I still got bad results (success rate=0 just like @ljjTYJR said). I have tested FetchPickAndPlace-v1 environment use the same algorithm and it worked well.
Do I have to modify something in the original HER algorithm in Baselines?
By the way, I've commented out "vrep.simxFinish(self.clientID)" before ( #11 ). Only after that I could get a better result in dVRLReach environment. Would this modification influence the training process?

I'm very confused about these problems and wishing for your reply.

@leey127
Hello, I also meet with this problem , i doubt it may be due to the vrep simulation, the pick environment uses two sensors to grasp the object, but i find it a little difficult to trigger.

Do you have any suggestions about this problem? @bango123

I would confirm the code I shared in the previous comment is able to solve the environment. Basically hand-craft a policy to solve it to confirm there are no other issues.