error while running

Question

error while running

Closed this issue 4 years ago · 9 comments

Hi I am getting the error below while running the code:

Traceback (most recent call last):
  File "tf14_runner.py", line 144, in <module>
    runner.run(args)
  File "tf14_runner.py", line 114, in run
    self.run_train()
  File "tf14_runner.py", line 98, in run_train
    agent = self.algo_factory.create(self.algo_name, sess=self.sess, base_name='run', observation_space=obs_space, action_space=action_space, config=self.config)  
  File "/home/anujm/Documents/rl_games/rl_games/common/object_factory.py", line 12, in create
    return builder(**kwargs)
  File "tf14_runner.py", line 25, in <lambda>
    self.algo_factory.register_builder('a2c_discrete', lambda **kwargs : a2c_discrete.A2CAgent(**kwargs)) 
  File "/home/anujm/Documents/rl_games/rl_games/algos_tf14/a2c_discrete.py", line 45, in __init__
    self.vec_env = vecenv.create_vec_env(self.env_name, self.num_actors, **self.env_config)
  File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 138, in create_vec_env
    return RayVecSMACEnv(config_name, num_actors, **kwargs)
  File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 101, in __init__
    self.num_agents = ray.get(res)
  File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/ray/worker.py", line 2193, in get
    raise value
ray.exceptions.RayTaskError: ray_worker (pid=16737, host=anujm-X299-A)
  File "/home/anujm/Documents/rl_games/rl_games/common/vecenv.py", line 58, in get_number_of_agents
    return self.env.get_number_of_agents()
AttributeError: 'BatchedFrameStack' object has no attribute 'get_number_of_agents'

Answer 1 · 2020-04-01T17:08:19.000Z

Hi, could you show me config which you are trying to run

Answer 2 · 2020-04-01T17:12:15.000Z

params:  
  algo:
    name: a2c_discrete

  model:
    name: discrete_a2c

  load_checkpoint: False
  load_path: 'nn/6h_vs_8z_cnnsmac_cnn'

  network:
    name: actor_critic
    separate: True
    #normalization: layer_norm
    space: 
      discrete:
      
    cnn:
      type: conv1d
      activation: relu
      initializer:
        name: variance_scaling_initializer
        scale: 2
      regularizer:
        name: 'None'
      convs:    
        - filters: 64
          kernel_size: 3
          strides: 2
          padding: 'same'
        - filters: 128
          kernel_size: 3
          strides: 1
          padding: 'valid'
        - filters: 256
          kernel_size: 3
          strides: 1
          padding: 'valid'
    mlp:
      units: [256, 128]
      activation: relu
      initializer:
        name: variance_scaling_initializer
        scale: 2 
      regularizer:
        name:  'None'
  config:
    name: 3m_cnn
    reward_shaper:
        scale_value: 1
    normalize_advantage: True
    gamma: 0.99
    tau: 0.95
    learning_rate: 1e-4
    score_to_win: 20
    grad_norm: 0.5
    entropy_coef: 0.001
    truncate_grads: True
    env_name:  smac_cnn
    ppo: true
    e_clip: 0.2
    clip_value: True
    num_actors: 2
    steps_num: 128
    minibatch_size: 1536
    mini_epochs: 1
    critic_coef: 2
    lr_schedule:  None
    lr_threshold: 0.05
    normalize_input: False
    seq_len: 2
    use_action_masks: True
    ignore_dead_batches : False

    env_config:
      name: 3m
      frames: 4
      transpose: True
      random_invalid_step: False

Answer 3 · 2020-04-01T17:17:56.000Z

I've found that I am using much newer version of the openai gym than in my requirements.txt:
Name: gym
Version: 0.15.4
Could you try to update it and check if it works?
I'll update requirements in this case.

Answer 4 · 2020-04-01T17:54:53.000Z

Hi Denys, I tried it, but it seems to have broken my environment. I now get: ''' AttributeError: module 'numpy' has no attribute 'equal' Traceback (most recent call last): File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 1, in <module> import tensorflow as tf File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/tensorflow/__init__.py", line 28, in <module> from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 63, in <module> from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/tensorflow/python/framework/framework_lib.py", line 25, in <module> from tensorflow.python.framework.ops import Graph File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 40, in <module> from tensorflow.python.eager import context File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 32, in <module> from tensorflow.python.framework import c_api_util File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/tensorflow/python/framework/c_api_util.py", line 25, in <module> from tensorflow.python.util import compat File "/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/tensorflow/python/util/compat.py", line 163, in <module> integral_types = (_numbers.Integral, _np.integer) AttributeError: module 'numpy' has no attribute 'integer' ''' Maybe you can share your exported environment? My current conda environment has the following: ''' # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 0_gnu conda-forge absl-py 0.9.0 pypi_0 pypi astor 0.8.1 pypi_0 pypi atari-py 0.2.6 pypi_0 pypi attrs 19.3.0 pypi_0 pypi ca-certificates 2019.11.28 hecc5488_0 conda-forge certifi 2019.11.28 py37hc8dfbb8_1 conda-forge chardet 3.0.4 pypi_0 pypi click 7.0 pypi_0 pypi cloudpickle 1.2.2 pypi_0 pypi colorama 0.4.3 pypi_0 pypi decorator 4.4.1 pypi_0 pypi deepdiff 4.2.0 pypi_0 pypi enum34 1.1.6 pypi_0 pypi filelock 3.0.12 pypi_0 pypi flatbuffers 1.11 pypi_0 pypi funcsigs 1.0.2 pypi_0 pypi future 0.18.2 pypi_0 pypi gast 0.3.3 pypi_0 pypi google-pasta 0.1.8 pypi_0 pypi grpcio 1.27.1 pypi_0 pypi gym 0.15.4 pypi_0 pypi h5py 2.10.0 pypi_0 pypi idna 2.8 pypi_0 pypi importlib-metadata 1.5.0 pypi_0 pypi keras-applications 1.0.8 pypi_0 pypi keras-preprocessing 1.1.0 pypi_0 pypi ld_impl_linux-64 2.34 h53a641e_0 conda-forge libffi 3.2.1 he1b5a44_1007 conda-forge libgcc-ng 9.2.0 h24d8f2e_2 conda-forge libgomp 9.2.0 h24d8f2e_2 conda-forge libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge markdown 3.2.1 pypi_0 pypi mock 4.0.1 pypi_0 pypi more-itertools 8.2.0 pypi_0 pypi mpyq 0.2.5 pypi_0 pypi ncurses 6.1 hf484d3e_1002 conda-forge numpy 1.15.4 pypi_0 pypi numpy-stl 2.11.2 pypi_0 pypi opencv-python 4.2.0.32 pypi_0 pypi openssl 1.1.1f h516909a_0 conda-forge ordered-set 3.1.1 pypi_0 pypi packaging 20.1 pypi_0 pypi pillow 7.0.0 pypi_0 pypi pip 20.0.2 py_2 conda-forge pluggy 0.13.1 pypi_0 pypi portpicker 1.3.1 pypi_0 pypi protobuf 3.11.3 pypi_0 pypi py 1.8.1 pypi_0 pypi pygame 1.9.6 pypi_0 pypi pyglet 1.3.2 pypi_0 pypi pyopengl 3.1.5 pypi_0 pypi pyparsing 2.4.6 pypi_0 pypi pysc2 3.0.0 pypi_0 pypi pytest 5.3.5 pypi_0 pypi python 3.7.6 h357f687_2 conda-forge python-utils 2.4.0 pypi_0 pypi python_abi 3.7 1_cp37m conda-forge pyyaml 5.3 pypi_0 pypi ray 0.6.6 pypi_0 pypi readline 8.0 hf8c457e_0 conda-forge redis 3.4.1 pypi_0 pypi requests 2.22.0 pypi_0 pypi s2clientprotocol 4.11.3.77661.0 pypi_0 pypi s2protocol 4.11.3.77661.0 pypi_0 pypi scipy 1.4.1 pypi_0 pypi setuptools 46.1.3 py37hc8dfbb8_0 conda-forge six 1.14.0 pypi_0 pypi sk-video 1.1.10 pypi_0 pypi smac 0.1.0b1 dev_0 <develop> sqlite 3.30.1 hcee41ef_0 conda-forge tensorboard 1.14.0 pypi_0 pypi tensorboardx 1.6 pypi_0 pypi tensorflow-estimator 1.14.0 pypi_0 pypi tensorflow-gpu 1.14.0 pypi_0 pypi tensorflow-probability 0.7.0 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi tk 8.6.10 hed695b0_0 conda-forge typing 3.7.4.1 pypi_0 pypi urllib3 1.25.8 pypi_0 pypi wcwidth 0.1.8 pypi_0 pypi websocket-client 0.57.0 pypi_0 pypi werkzeug 1.0.0 pypi_0 pypi wheel 0.34.2 py_1 conda-forge whichcraft 0.6.1 pypi_0 pypi wrapt 1.11.2 pypi_0 pypi xz 5.2.4 h516909a_1002 conda-forge zipp 2.2.0 pypi_0 pypi zlib 1.2.11 h516909a_1006 conda-forge ''' Cheers, Anuj

…

________________________________ From: Denys88 <notifications@github.com> Sent: Wednesday, April 1, 2020 6:18 PM To: Denys88/rl_games <rl_games@noreply.github.com> Cc: Anuj <anuj.mahajan@cs.ox.ac.uk>; Author <author@noreply.github.com> Subject: Re: [Denys88/rl_games] error while running (#13) I've found that I am using much newer version of the openai gym than in my requirements.txt: Name: gym Version: 0.15.4 Could you try to update it and check if it works? I'll update requirements in this case. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#13 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHZCCE4UK6MRLDYFVB54PVTRKNZNLANCNFSM4LZEMDOA>.

Answer 5 · 2020-04-01T18:20:16.000Z

requirements_conda.txt

Answer 6 · 2020-04-01T18:20:57.000Z

My numpy version is newer

Answer 7 · 2020-04-01T19:07:21.000Z

Thanks, after fixing numpy version, I get this error:

(pid=22015) RequestQuit command received.
(pid=22015) Closing Application...
(pid=22015) unable to parse websocket frame.
frames per seconds:  119.09246403768397
/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3257: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/anujm/anaconda3/envs/rlgames/lib/python3.7/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 144, in <module>
    runner.run(args)
  File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 114, in run
    self.run_train()
  File "/home/anujm/Desktop/port_rlgames/tf14_runner.py", line 101, in run_train
    agent.train()
  File "/home/anujm/Desktop/port_rlgames/algos_tf14/a2c_discrete.py", line 417, in train
    self.writer.add_scalar('info/last_lr', last_lr * lr_mul, frame)
UnboundLocalError: local variable 'lr_mul' referenced before assignment

Answer 8 · 2020-04-01T19:37:23.000Z

Could you decrease minibatch_size: 1536 to the the 1536/4=384
if we do 128 steps with two envs and 3 agents. Total batch size 3072 and 1536 for minibatch is halfsize.
If we have 2 envs total batch size will be 768. And minibatch should be 384 here.
I am sorry for inconvenience. I am thinking how to make it user friendly.

Answer 9 · 2020-04-01T23:17:03.000Z

Thanks, I just commented the logger calls involving lr_mul, it seemed to work then.