Denys88/rl_games

error while running

Closed this issue · 9 comments

Hi I am getting the error below while running the code:

Traceback (most recent call last):
  File "tf14_runner.py", line 144, in <module>
    runner.run(args)
  File "tf14_runner.py", line 114, in run
    self.run_train()
  File "tf14_runner.py", line 98, in run_train
    agent = self.algo_factory.create(self.algo_name, sess=self.sess, base_name='run', observation_space=obs_space, action_space=action_space, config=self.config)  
  File "/home/anujm/Documents/rl_games/rl_games/common/object_factory.py", line 12, in create
    return builder(**kwargs)
  File "tf14_runner.py", line 25, in <lambda>
    self.algo_factory.register_builder('a2c_discrete', lambda **kwargs : a2c_discrete.A2CAgent(**kwargs)) 
  File "/home/anujm/Documents/rl_games/rl_games/algos_tf14/a2c_discrete.py", line 45, in __init__
    self.vec_env = vecenv.create_vec_env(self.env_name, self.num_actors, **self.env_config)
  File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 138, in create_vec_env
    return RayVecSMACEnv(config_name, num_actors, **kwargs)
  File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 101, in __init__
    self.num_agents = ray.get(res)
  File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/ray/worker.py", line 2193, in get
    raise value
ray.exceptions.RayTaskError: ray_worker (pid=16737, host=anujm-X299-A)
  File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 58, in get_number_of_agents
    return self.env.get_number_of_agents()
AttributeError: 'BatchedFrameStack' object has no attribute 'get_number_of_agents'

Hi, could you show me config which you are trying to run

params:  
  algo:
    name: a2c_discrete

  model:
    name: discrete_a2c

  load_checkpoint: False
  load_path: 'nn/6h_vs_8z_cnnsmac_cnn'

  network:
    name: actor_critic
    separate: True
    #normalization: layer_norm
    space: 
      discrete:
      
    cnn:
      type: conv1d
      activation: relu
      initializer:
        name: variance_scaling_initializer
        scale: 2
      regularizer:
        name: 'None'
      convs:    
        - filters: 64
          kernel_size: 3
          strides: 2
          padding: 'same'
        - filters: 128
          kernel_size: 3
          strides: 1
          padding: 'valid'
        - filters: 256
          kernel_size: 3
          strides: 1
          padding: 'valid'
    mlp:
      units: [256, 128]
      activation: relu
      initializer:
        name: variance_scaling_initializer
        scale: 2 
      regularizer:
        name:  'None'
  config:
    name: 3m_cnn
    reward_shaper:
        scale_value: 1
    normalize_advantage: True
    gamma: 0.99
    tau: 0.95
    learning_rate: 1e-4
    score_to_win: 20
    grad_norm: 0.5
    entropy_coef: 0.001
    truncate_grads: True
    env_name:  smac_cnn
    ppo: true
    e_clip: 0.2
    clip_value: True
    num_actors: 2
    steps_num: 128
    minibatch_size: 1536
    mini_epochs: 1
    critic_coef: 2
    lr_schedule:  None
    lr_threshold: 0.05
    normalize_input: False
    seq_len: 2
    use_action_masks: True
    ignore_dead_batches : False

    env_config:
      name: 3m
      frames: 4
      transpose: True
      random_invalid_step: False

I've found that I am using much newer version of the openai gym than in my requirements.txt:
Name: gym
Version: 0.15.4
Could you try to update it and check if it works?
I'll update requirements in this case.

My numpy version is newer

Thanks, after fixing numpy version, I get this error:

(pid=22015) RequestQuit command received.
(pid=22015) Closing Application...
(pid=22015) unable to parse websocket frame.
frames per seconds:  119.09246403768397
/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 144, in <module>
    runner.run(args)
  File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 114, in run
    self.run_train()
  File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 101, in run_train
    agent.train()
  File "/home/anujm/Desktop/port_rlgames/algos_tf14/a2c_discrete.py", line 417, in train
    self.writer.add_scalar('info/last_lr', last_lr * lr_mul, frame)
UnboundLocalError: local variable 'lr_mul' referenced before assignment

Could you decrease minibatch_size: 1536 to the the 1536/4=384
if we do 128 steps with two envs and 3 agents. Total batch size 3072 and 1536 for minibatch is halfsize.
If we have 2 envs total batch size will be 768. And minibatch should be 384 here.
I am sorry for inconvenience. I am thinking how to make it user friendly.

Thanks, I just commented the logger calls involving lr_mul, it seemed to work then.