dqn_main.py hangs
Closed this issue · 3 comments
Shmuma commented
Hi!
Installed cule with python 3.7 and pytorch 1.1.0, cuda 10.0. Execution of dqn_main.py hangs with the following messages:
GeForce GTX 1080 Ti : 1632.500 Mhz (Ordinal 0)
28 SMs enabled. Compute Capability sm_61
FreeMem: 10,687MB TotalMem: 11,178MB 64-bit pointers.
Mem Clock: 5505.000 Mhz x 352 bits (484.4 GB/s)
ECC Disabled
Selected optimization level O0: Pure FP32 training.
Defaults for this optimization level are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
DQN(
(conv): Sequential(
(0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4))
(1): ReLU()
(2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
(3): ReLU()
(4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU()
)
(fc_a): Sequential(
(0): Linear(in_features=3136, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=6, bias=True)
)
)
Initializing evaluation memory with 500 entries...
No activity on cpu and gpu
Shmuma commented
The same happens with a2c_main.py:
PyTorch : 1.1.0
CUDA : 10.0.130
CUDNN : 7501
APEX : 0.1.0
GeForce GTX 1080 Ti : 1632.500 Mhz (Ordinal 0)
28 SMs enabled. Compute Capability sm_61
FreeMem: 10,687MB TotalMem: 11,178MB 64-bit pointers.
Mem Clock: 5505.000 Mhz x 352 bits (484.4 GB/s)
ECC Disabled
ifrosio commented
Looking into it - for the a2c case, can you try running it with --use-cuda-env --use-openai-test-env? This will do two things:
- --use-cuda-env runs CuLE on the GPU, otherwise CuLE envs run on the CPU;
- --use-openai-test-env will use openai (instead of CuLE CPU) for testing.
KyunghyunLee commented
I got a similar problem:
$ python ppo_main.py --use-cuda-env --use-openai-test-env
{'ale_start_steps': 400,
'alpha': 0.99,
'batch_size': 256,
'clip_epsilon': 0.1,
'conf_file': None,
'entropy_coef': 0.01,
'env_name': 'PongNoFrameskip-v4',
'episodic_life': False,
'eps': 1e-05,
'evaluation_episodes': 10,
'evaluation_interval': 1000000,
'gamma': 0.99,
'gpu': 0,
'local_rank': 0,
'log_dir': 'runs',
'loss_scale': None,
'lr': 0.00065,
'lr_scale': False,
'max_episode_length': 18000,
'max_grad_norm': 0.5,
'multiprocessing_distributed': False,
'no_cuda_train': True,
'normalize': False,
'num_ales': 16,
'num_gpus_per_node': -1,
'num_stack': 4,
'num_steps': 5,
'opt_level': 'O0',
'output_filename': None,
'plot': False,
'ppo_epoch': 3,
'profile': False,
'save_interval': 0,
'seed': 1565658549,
't_max': 50000000,
'tau': 1.0,
'use_adam': False,
'use_cuda_env': True,
'use_gae': False,
'use_openai': False,
'use_openai_test_env': True,
'value_loss_coef': 0.5,
'verbose': False}
PyTorch : 1.0.0
CUDA : 10.0.130
CUDNN : 7401
APEX : 0.1.0
GeForce GTX 1080 Ti : 0.000 Mhz (Ordinal 0)
131072 SMs enabled. Compute Capability sm_00
FreeMem: 11,019MB TotalMem: 11,178MB 64-bit pointers.
Mem Clock: 98.304 Mhz x 0 bits ( 0.0 GB/s)
ECC Enabled
GPUassert: invalid device symbol /home/lkh/Codes/cule/cule/atari/cuda/tables.hpp 43