Sohojoe/MarathonEnvsBaselines

Error in initializing the fully-connected layer

maystroh opened this issue · 4 comments

I'm trying to use your code to run the baselines algos of openai with a unity3D environment. Here is the command I'm using to launch the training:

python -m baselines.run_unity --alg=ppo2 --env=./envs/env.x86_64 --num_timesteps=1e6 --save_path=./models/test

Here is the problem I'm getting:

File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/hassan/Desktop/Unity-Gym/baselines/run_unity.py", line 248, in
main()
File "/home/hassan/Desktop/Unity-Gym/baselines/run_unity.py", line 222, in main
model, env = train(args, extra_args)
File "/home/hassan/Desktop/Unity-Gym/baselines/run_unity.py", line 79, in train
**alg_kwargs
File "/home/hassan/Desktop/Unity-Gym/baselines/ppo2/ppo2.py", line 305, in learn
model = make_model()
File "/home/hassan/Desktop/Unity-Gym/baselines/ppo2/ppo2.py", line 304, in
max_grad_norm=max_grad_norm)
File "/home/hassan/Desktop/Unity-Gym/baselines/ppo2/ppo2.py", line 39, in init
act_model = policy(nbatch_act, 1, sess)
File "/home/hassan/Desktop/Unity-Gym/baselines/common/policies.py", line 142, in policy_fn
policy_latent = policy_network(encoded_x)
File "/home/hassan/Desktop/Unity-Gym/baselines/common/models.py", line 52, in network_fn
h = fc(h, 'mlp_fc{}'.format(i), nh=num_hidden, init_scale=np.sqrt(2))
File "/home/hassan/Desktop/Unity-Gym/baselines/a2c/utils.py", line 65, in fc
w = tf.get_variable("w", [nin, nh], initializer=ortho_init(init_scale))
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1467, in get_variable
aggregation=aggregation)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1217, in get_variable
aggregation=aggregation)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 527, in get_variable
aggregation=aggregation)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 481, in _true_getter
aggregation=aggregation)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 903, in _get_single_variable
aggregation=aggregation)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2443, in variable
aggregation=aggregation)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2425, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 2406, in default_variable_creator
constraint=constraint)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 259, in init
constraint=constraint)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 368, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 885, in
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "/home/hassan/Desktop/Unity-Gym/baselines/a2c/utils.py", line 35, in _ortho_init
u, _, v = np.linalg.svd(a, full_matrices=False)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/numpy/linalg/linalg.py", line 1368, in svd
_assertNoEmpty2d(a)
File "/home/hassan/anaconda3/envs/GymUnity/lib/python3.5/site-packages/numpy/linalg/linalg.py", line 226, in _assertNoEmpty2d
raise LinAlgError("Arrays cannot be empty")
numpy.linalg.linalg.LinAlgError: Arrays cannot be empty

The problem is coming from the function ortho_init in utils.py. Please check below for more details.

def ortho_init(scale=1.0):
    def _ortho_init(shape, dtype, partition_info=None):
        #lasagne ortho init for tf
        print(shape) #(0, 64)
        shape = tuple(shape)
        print(shape) #(0, 64)
        if len(shape) == 2:
            flat_shape = shape
        elif len(shape) == 4: # assumes NHWC
            flat_shape = (np.prod(shape[:-1]), shape[-1])
        else:
            raise NotImplementedError
        print(flat_shape) #(0, 64)
        a = np.random.normal(0.0, 1.0, flat_shape) # The output of this method is an empty array.
        print(a)
        u, _, v = np.linalg.svd(a, full_matrices=False)
        q = u if u.shape == flat_shape else v # pick the one with the correct shape
        q = q.reshape(shape)
        return (scale * q[:shape[0], :shape[1]]).astype(np.float32)
    return _ortho_init

I just wanted to report it here in case I can get some help ASAP. I will continue my inquiries and will post more details once I fix it.

I've fixed it.. It was just that I used the wrong method. I should use this method
env = make_unity_env(env_id, args.num_env or 1, args.visual_obs)
instead of env = make_vec_env(env_id, env_type, args.num_env or 1, seed, reward_scale=args.reward_scale)

def build_env(args):
    ncpu = multiprocessing.cpu_count()
    if sys.platform == 'darwin': ncpu //= 2
    nenv = args.num_env or ncpu
    alg = args.alg
    rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
    seed = args.seed

    env_type, env_id = get_env_type(args.env)

    if env_type == 'atari':
        ..

    elif env_type == 'retro':
        ..

    elif env_type == 'unity':
       get_session(tf.ConfigProto(allow_soft_placement=True,
                                   intra_op_parallelism_threads=1,
                                   inter_op_parallelism_threads=1))
       # env = make_vec_env(env_id, env_type, args.num_env or 1, seed, reward_scale=args.reward_scale)
       env = make_unity_env(env_id, args.num_env or 1, args.visual_obs)
    #    env = VecNormalize(env)

Hi @maystroh - I'm glad you figured it out.

It would be good to get your feedback as at some point, it would be good to fold the baselines capabilities back into ml-agents.

For me, I was exploring baselines to see if I could speed up training over ml-agents (leverage the gpu etc) - however, I found that

  • gpu was slower than cpu
  • most of the baselines algorithms do not support multi-agent environments that well
  • I got a lot more performance of out ml-agents by scaling up the number of concurrent agents to 64

I've also been trying to get HER working, however, the baselines HER code is quite deeply coupled with MPI and Mujoco - so I'm thinking it may be faster to try implementing DDQN = HER into ml-agents

Actually, I have the same goal: exploring the baselines to double check if their PPO implementation is really optimized for GPU. Since I'm working with visual observation, the trainings are expected to be faster with GPU no? Just to verify your outcomes: gpu was slower than cpu for both vector and visual observations?

So far, I'm working with only one agent but will try double check this case once I finish what I have to accomplish. I will update this thread with whatever interesting info I find during my exploration.

I have not tried visual observations yet - so I'm interested to know how this works out!!

Yes, I can confirm that CPU was fast than GPU in my tests. My state/action spaces are small and also my buffer size is small(ish) so this may be contributing to why that was the case. Note: I'm using an optimized version of tensorflow - here are some links: