Questions about training performance

Question

Questions about training performance

Opened this issue 2 months ago · 11 comments

Hi， @chernyadev! I follow [https://github.com/robobase-org/robobase.git]'s implementation and run the Diffusion Policy/ACT training code on bigym [https://github.com/chernyadev/bigym/tree/fix_original_demos] but even through extensive hardwork on debugging, I still can't make things work. Here's what I've tried:

I choose dishwasher_close and drawer_top_close tasks for training, where in the original bigym paper, these two tasks' performance are both almost near 100% success rate on Diffusion Policy and ACT. The robobase doesn't figure out how to change the robobase_config.yaml to fit imitation learning pipeline, so I change the robobase_config.yaml according to the bigym paper. Here's my robobase_config:

defaults:
  - _self_
  - env: null
  - method: null
  - intrinsic_reward_module: null
  - launch: null
  - override hydra/launcher: joblib

# Universal settings
create_train_env: true
num_train_envs: 1
replay_size_before_train: 3000
num_pretrain_steps: 1100000
num_train_frames: 1100000
eval_every_steps: 10000
num_eval_episodes: 5
update_every_steps: 2
num_explore_steps: 2000
save_snapshot: false
snapshot_every_n: 10000
batch_size: 256

# Demonstration settings
demos: 200
demo_batch_size: 256 # null  # If set to > 0, introduce a separate buffer for demos
use_self_imitation: true # false  # When using a separate buffer for demos, If set to True, save successful (online) trajectories into the separate demo buffer

# Observation settings
pixels: false
visual_observation_shape: [84, 84]
frame_stack: 2
frame_stack_on_channel: true
use_onehot_time_and_no_bootstrap: false

# Action settings
action_repeat: 1
action_sequence: 16 # ActionSequenceWrapper
execution_length: 8  # If execution_length < action_sequence, we use receding horizon control
temporal_ensemble: true  # Temporal ensemling only applicable to action sequence > 1
temporal_ensemble_gain: 0.01
use_standardization: true  # Demo-based standardization for action space
use_min_max_normalization: true  # Demo-based min-max normalization for action space
min_max_margin: 0.0  # If set to > 0, introduce margin for demo-driven min-max normalization
norm_obs: true

# Replay buffer settings
replay:
  prioritization: false
  size: 1000000
  gamma: 0.99
  demo_size: 1000000
  save_dir: null
  nstep: 3
  num_workers: 4
  pin_memory: true
  alpha: 0.7  # prioritization
  beta: 0.5  # prioritization
  sequential: false
  transition_seq_len: 1  # The length of transition sequence returned from sample() call. Only applicable if sequential is True

# logging settings
wandb:  # weight and bias
  use: true
  project: ${oc.env:USER}RoboBase
  name: null

tb:  # TensorBoard
  use: false
  log_dir: /tmp/robobase_tb_logs
  name: null

# Misc
experiment_name: exp
seed: 1
num_gpus: 1
log_every: 1000
log_train_video: false
log_eval_video: true
log_pretrain_every: 100
save_csv: false

hydra:
  run:
    dir: ./exp_local/${now:%Y.%m.%d}/${now:%H%M%S}_${hydra.job.override_dirname}
  sweep:
    dir: ./exp_local/${now:%Y.%m.%d}/${now:%H%M}_${hydra.job.override_dirname}
    subdir: ${hydra.job.num}

Then I utilize the command to start training on diffusion policy:

python train.py method=act env=bigym/dishwasher_close  replay.nstep=1

But nothing works. Success rate is always zero. Besides, It seems that there's a bug in your temporal ensemble code. In the robobase/robobase/envs/wrappers/action_sequence.py, def step_sequence(), your temporal ensemble code is:

        self._action_history[
            self._cur_step, self._cur_step : self._cur_step + self._sequence_length
        ] = action
        for i, sub_action in enumerate(action):
            if self._temporal_ensemble and self._sequence_length > 1:
                # Select all predicted actions for self._cur_step. This will cover the
                # actions from [cur_step - sequence_length + 1, cur_step)
                # Note that not all actions in this range will be valid as we might have
                # execution_length > 1, which skips some of the intermediate steps.
                cur_actions = self._action_history[:, self._cur_step]
                indices = np.all(cur_actions != 0, axis=1)
                cur_actions = cur_actions[indices]

                # earlier predicted actions will have smaller weights.
                exp_weights = np.exp(-self._gain * np.arange(len(cur_actions)))
                exp_weights = (exp_weights / exp_weights.sum())[:, None]
                sub_action = (cur_actions * exp_weights).sum(axis=0)

            observation, reward, termination, truncation, info = self.env.step(
                sub_action
            )
            self._cur_step += 1
            if self.is_demo_env:
                demo_actions[i] = info.pop("demo_action")
            total_reward += reward
            action_idx_reached += 1
            if termination or truncation:
                break
            
            if not self.is_demo_env:
                if action_idx_reached == self._execution_length:
                    break

It seems that even being set to temporal ensemble, the function still execute the ‘for' loop and output sub_action multiple times. And each time the sub_action is the same. So I change the code to the following code to avoid this:

        self._action_history[
            self._cur_step, self._cur_step : self._cur_step + self._sequence_length
        ] = action
        for i, sub_action in enumerate(action):
            if self._temporal_ensemble and self._sequence_length > 1:
                # Select all predicted actions for self._cur_step. This will cover the
                # actions from [cur_step - sequence_length + 1, cur_step)
                # Note that not all actions in this range will be valid as we might have
                # execution_length > 1, which skips some of the intermediate steps.
                cur_actions = self._action_history[:, self._cur_step]
                indices = np.all(cur_actions != 0, axis=1)
                cur_actions = cur_actions[indices]

                # earlier predicted actions will have smaller weights.
                exp_weights = np.exp(-self._gain * np.arange(len(cur_actions)))
                exp_weights = (exp_weights / exp_weights.sum())[:, None]
                sub_action = (cur_actions * exp_weights).sum(axis=0)

            observation, reward, termination, truncation, info = self.env.step(
                sub_action
            )
            self._cur_step += 1
            if self.is_demo_env:
                demo_actions[i] = info.pop("demo_action")
            total_reward += reward
            action_idx_reached += 1
            if termination or truncation:
                break
            if self._temporal_ensemble and self._sequence_length > 1:
                break
            if not self.is_demo_env:
                if action_idx_reached == self._execution_length:
                    break

Then I launch another training but still nothing works.
Then I take a look at the eval video:

Drawer_top_close:

It seems that the robot completed the task to a certain degree, but the overall success rate is still zero and the task reward is zero, too. This is wired.

Dishwasher_close:

The robot fails to complete the task at all.

Besides I found you use DDIM for training. But as far as I know DDIM can be only used to accelerate sampling of DDPM, but the training code still relies on DDPM. I'm not a diffusion expert so I don't know if I'm wrong.

And so I tried to launch training on ACT code and it turns out:

Error executing job with overrides: ['method=act', 'env=bigym/dishwasher_close', 'replay.nstep=1']
Error in call to target 'robobase.method.act.ActBCAgent':
AttributeError("'NoneType' object has no attribute 'output_shape'")
full_key: method

I figure out that this is because the ACT code only supports image obs, where in above experiments I used state obs. So I change the pixel term in the robobase_config.yaml:

# Observation settings
pixels: true

And it turns out [Inquiry Regarding Using Repo with BiGym · Issue #3 · robobase-org/robobase (github.com)](robobase-org/robobase#3) which remains not answered.

Then I want to know if there's anything wrong on the demos. But I can't run your demo replay code since my server doesn't have a monitor or GUI.

So I'm stuck here and really wish for your help. To be honest, I think your bigym is really an amazing benchmark since it's the only one I know that supports long-horizon mobile manipulation and has plenty of human demos and tasks. I believe that better maintenance on this respository will definitly enlarge the impact of this work in the community. I hope my feedback can aid you to better develop this.

Answer 1 · 2024-12-02T22:22:23.000Z

Hi, you might want to check a recent PR (robobase-org/robobase#5), though I have only tested this with pixels. Also check code for my recent project that uses BiGym too: https://github.com/younggyoseo/CQN-AS

Answer 2 · 2024-12-04T08:06:50.000Z

Wow! I was thinking about how to reach to your bigym code when I first saw your amazing CQN-AS paper a few weeks ago. I didn't expect you to personally reply to me. Thanks a lot! This bigym code will help a lot. Would also love to try your CQN-AS code!

Answer 3 · 2024-12-26T12:45:58.000Z

Wow! I was thinking about how to reach to your bigym code when I first saw your amazing CQN-AS paper a few weeks ago. I didn't expect you to personally reply to me. Thanks a lot! This bigym code will help a lot. Would also love to try your CQN-AS code!

Hello, I encountered the same problem as you when applying ACT to BiGym. I would like to ask if you have implemented the application of ACT in BiGym based on the author's CQN-AS code? I still can't find the running configuration of ACT in the CQN-AS code repository.

Answer 4 · 2024-12-26T12:47:14.000Z

Hi, you might want to check a recent PR (robobase-org/robobase#5), though I have only tested this with pixels. Also check code for my recent project that uses BiGym too: https://github.com/younggyoseo/CQN-AS

Hello, thank you for your answer and help, but I still can't find the running configuration of ACT in the CQN-AS code repository. Can you provide a clearer way to run ACT in BiGym? Thank you very much

Answer 5 · 2024-12-26T16:57:55.000Z

Can I ask why are you looking for ACT implementation in CQN-AS codebase? As I said, you can use robobase repository to run ACT. Please follow instructions in README of robobase repository.

Answer 6 · 2024-12-27T09:00:27.000Z

Can I ask why are you looking for ACT implementation in CQN-AS codebase? As I said, you can use robobase repository to run ACT. Please follow instructions in README of robobase repository.

Thanks for your answer! because you mentioned in your answer above: "Hi, you might want to check a recent PR (robobase-org/robobase#5), though I have only tested this with pixels. Also check code for my recent project that uses BiGym too: https://github.com/younggyoseo/CQN-AS" and in PR5 you mentioned "This should fix the issue of robobase failing to make ACT work on BiGym. This is the exact codebase I used for reporting ACT results on BiGym for my recent project: https://younggyo.me/cqn-as/.", so I went to the CQN-AS repository to find the ACT run (I found that the CQN-AS paper also included ACT results), but did not find any relevant content about ACT in CQN-AS.

Answer 7 · 2024-12-27T09:02:56.000Z

Okay, it seems like there was a misunderstanding: PR5 is the code that I used for reporting ACT results, so you could follow instructions in robobase. Please follow the instructions in README in robobase and let me know if there's any problem in reproducing the results in pixels (ideally in robobase repository's issue while mentioning me). Thank you!

Answer 8 · 2024-12-27T09:13:00.000Z

Okay, it seems like there was a misunderstanding: PR5 is the code that I used for reporting ACT results, so you could follow instructions in robobase. Please follow the instructions in README in robobase and let me know if there's any problem in reproducing the results in pixels (ideally in robobase repository's issue while mentioning me). Thank you!

Thank you for your answer. I will conduct the ACT experiment on robobase. You mentioned that I need to follow the README of robobase, but there is no running command for BiGym in the README of robobase. I only found the only command related to BiGym: python3 train.py method=act launch=act_pixel_bigym env=bigym/dishwasher_close wandb.name=act_bigym_dishwasher_close batch_size=256 demos=-1
, will you provide the running commands for other tasks in the paper? This is very important to us. Thank you! BiGym and Robobase are the best humanoid robot imitation learning platforms we have seen. I hope to make more contributions based on your work and hope to get more support from you, thank you!

Answer 9 · 2024-12-27T09:15:08.000Z

Okay, it seems like there was a misunderstanding: PR5 is the code that I used for reporting ACT results, so you could follow instructions in robobase. Please follow the instructions in README in robobase and let me know if there's any problem in reproducing the results in pixels (ideally in robobase repository's issue while mentioning me). Thank you!

OK, I understand. Thank you for your prompt and responsible answer! Could you please answer my question about running ACT in robobase? Or I can ask in the robobase issue. Thank you!

Answer 10 · 2024-12-27T09:22:16.000Z

I'm a bit confused; you can set the task by changing the task name in env=bigym/dishwasher_close. You can also check task configs here. Could you please use the robobase repository if you have further questions or if I'm misunderstanding something here? Thank you!

Answer 11 · 2024-12-27T09:46:43.000Z

I'm a bit confused; you can set the task by changing the task name in env=bigym/dishwasher_close. You can also check task configs here. Could you please use the robobase repository if you have further questions or if I'm misunderstanding something here? Thank you!

OK, thank you! I will try it, thank you for your patience!