validate.py causes segmentation fault
raks097 opened this issue · 4 comments
@raunakbh92 @wulfebw
Hi, A wonderful paper and thank your for providing the implementations.
I was able to train the GAIL agent but when I am running the validate.py I am running into a segmentation fault.
Currently running with Julia V1.1.0 and Ubuntu 18.04
"""
Traceback (most recent call last):
File "/home/asyin/anaconda3/envs/rllab3/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "validate.py", line 135, in collect_trajectories
env_kwargs=dict(egoid=egoid, start=starts[egoid])
File "validate.py", line 31, in simulate
a, a_info = policy.get_action(x)
File "/home/asyin/R/rllab/hgail/hgail/policies/gaussian_latent_var_gru_policy.py", line 193, in get_action
return actions[0], {k: v[0] for k, v in agent_infos.items()}
File "/home/asyin/R/rllab/hgail/hgail/policies/gaussian_latent_var_gru_policy.py", line 193, in
return actions[0], {k: v[0] for k, v in agent_infos.items()}
KeyError: 0
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "validate.py", line 381, in
random_seed=run_args.random_seed
File "validate.py", line 266, in collect
random_seed=random_seed
File "validate.py", line 188, in parallel_collect_trajectories
[res.get() for res in results]
File "validate.py", line 188, in
[res.get() for res in results]
File "/home/asyin/anaconda3/envs/rllab3/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
KeyError: 0
signal (15): Terminated
in expression starting at no file:0
read at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
signal (15): Terminated
in expression starting at no file:0
signal (15): Terminated
in expression starting at no file:0
signal (15): Terminated
in expression starting at no file:0
_Py_read at /home/ilan/minonda/conda-bld/work/Python-3.5.2/Python/fileutils.c:1205
_PyObject_GenericGetAttrWithDict at /home/ilan/minonda/conda-bld/work/Python-3.5.2/Objects/object.c:1053
signal (15): Terminated
in expression starting at no file:0
signal (11): Segmentation fault
in expression starting at no file:0
@raunakbh92
Not sure what the reason for the segfault might be.
Any changes that need to be made in the gaussian_latent_var_gru_policy.py ?
UPDATE:
File "/home/asyin/R/rllab/hgail/hgail/policies/gaussian_latent_var_gru_policy.py", line 193, in get_action
return actions[0], {k: v[0] for k, v in agent_infos.items()}
KeyError: 0
AGENT_INFO
[('mean', array([[ 0.01187703, -0.02830665]], dtype=float32)), ('prev_action', array([[ 0., 0.]])), ('latent_info', {'latent': array([[1, 0, 0, 0]])}), ('log_std', array([[-0.67997968, -0.72506523]], dtype=float32)), ('latent', array([[1, 0, 0, 0]]))]
I fixed the inital KeyError:0 by changing the way the dictionary was created by removing the latent_info key.
However,
I still am getting this error,
signal (11): Segmentation fault
in expression starting at no file:0
Similar to the one mentioned in https://github.com/sisl/ngsim_env/blob/c34f2c4bd6bf2e089b69bddefb4283ef6829c042/docs/usingTrainedPolicy.md
But deleting PyCall cache from ~/.julia/complied/v1.1 didnt work. Should try reverting to 0.6 (if so how?) or are there any other solutions ?
Thanks
@raks097 Hi man, have you solved this issue? So currently, you are only able to run training a policy?
agent_infos: {'mean': array([[-0.36561757, -0.4166246 ]], dtype=float32), 'log_std': array([[0.01288844, 0.06403346]], dtype=float32), 'prev_action': array([[0., 0.]]), 'latent': array([[0, 1, 0, 0]]), 'latent_info': {'latent': array([[0, 1, 0, 0]])}}
change 'latent_info': {'latent': array([[0, 1, 0, 0]])} ---> 'latent_info': array([[0, 1, 0, 0]])
python validate.py --n_proc 4 --exp_dir ../../data/experiments/NGSIM-gail/ --params_filename itr_1000.npz --random_seed 42
For example, if I have selected n_proc equal to 4 here, I met some errors that only some of the pid worked, one of them (say 2) might fail.
pid: 0 or 1 or 3 traj: 515 / 516
pid equal to 2 never showed again.