error while running
Closed this issue · 9 comments
Hi I am getting the error below while running the code:
Traceback (most recent call last):
File "tf14_runner.py", line 144, in <module>
runner.run(args)
File "tf14_runner.py", line 114, in run
self.run_train()
File "tf14_runner.py", line 98, in run_train
agent = self.algo_factory.create(self.algo_name, sess=self.sess, base_name='run', observation_space=obs_space, action_space=action_space, config=self.config)
File "/home/anujm/Documents/rl_games/rl_games/common/object_factory.py", line 12, in create
return builder(**kwargs)
File "tf14_runner.py", line 25, in <lambda>
self.algo_factory.register_builder('a2c_discrete', lambda **kwargs : a2c_discrete.A2CAgent(**kwargs))
File "/home/anujm/Documents/rl_games/rl_games/algos_tf14/a2c_discrete.py", line 45, in __init__
self.vec_env = vecenv.create_vec_env(self.env_name, self.num_actors, **self.env_config)
File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 138, in create_vec_env
return RayVecSMACEnv(config_name, num_actors, **kwargs)
File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 101, in __init__
self.num_agents = ray.get(res)
File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/ray/worker.py", line 2193, in get
raise value
ray.exceptions.RayTaskError: ray_worker (pid=16737, host=anujm-X299-A)
File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 58, in get_number_of_agents
return self.env.get_number_of_agents()
AttributeError: 'BatchedFrameStack' object has no attribute 'get_number_of_agents'
Hi, could you show me config which you are trying to run
params:
algo:
name: a2c_discrete
model:
name: discrete_a2c
load_checkpoint: False
load_path: 'nn/6h_vs_8z_cnnsmac_cnn'
network:
name: actor_critic
separate: True
#normalization: layer_norm
space:
discrete:
cnn:
type: conv1d
activation: relu
initializer:
name: variance_scaling_initializer
scale: 2
regularizer:
name: 'None'
convs:
- filters: 64
kernel_size: 3
strides: 2
padding: 'same'
- filters: 128
kernel_size: 3
strides: 1
padding: 'valid'
- filters: 256
kernel_size: 3
strides: 1
padding: 'valid'
mlp:
units: [256, 128]
activation: relu
initializer:
name: variance_scaling_initializer
scale: 2
regularizer:
name: 'None'
config:
name: 3m_cnn
reward_shaper:
scale_value: 1
normalize_advantage: True
gamma: 0.99
tau: 0.95
learning_rate: 1e-4
score_to_win: 20
grad_norm: 0.5
entropy_coef: 0.001
truncate_grads: True
env_name: smac_cnn
ppo: true
e_clip: 0.2
clip_value: True
num_actors: 2
steps_num: 128
minibatch_size: 1536
mini_epochs: 1
critic_coef: 2
lr_schedule: None
lr_threshold: 0.05
normalize_input: False
seq_len: 2
use_action_masks: True
ignore_dead_batches : False
env_config:
name: 3m
frames: 4
transpose: True
random_invalid_step: False
I've found that I am using much newer version of the openai gym than in my requirements.txt:
Name: gym
Version: 0.15.4
Could you try to update it and check if it works?
I'll update requirements in this case.
My numpy version is newer
Thanks, after fixing numpy version, I get this error:
(pid=22015) RequestQuit command received.
(pid=22015) Closing Application...
(pid=22015) unable to parse websocket frame.
frames per seconds: 119.09246403768397
/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 144, in <module>
runner.run(args)
File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 114, in run
self.run_train()
File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 101, in run_train
agent.train()
File "/home/anujm/Desktop/port_rlgames/algos_tf14/a2c_discrete.py", line 417, in train
self.writer.add_scalar('info/last_lr', last_lr * lr_mul, frame)
UnboundLocalError: local variable 'lr_mul' referenced before assignment
Could you decrease minibatch_size: 1536 to the the 1536/4=384
if we do 128 steps with two envs and 3 agents. Total batch size 3072 and 1536 for minibatch is halfsize.
If we have 2 envs total batch size will be 768. And minibatch should be 384 here.
I am sorry for inconvenience. I am thinking how to make it user friendly.
Thanks, I just commented the logger calls involving lr_mul, it seemed to work then.