kwai/DouZero

当我使用5张A100 80G训练时,如何提高训练速度或者应该如何修改符合机器的参数

Cyclones-Y opened this issue · 2 comments

这是我的参数设置
"args": {
"actor_device_cpu": false,
"alpha": 0.99,
"batch_size": 32,
"disable_checkpoint": false,
"epsilon": 1e-05,
"exp_epsilon": 0.01,
"gpu_devices": "0,1,2,3,4",
"learning_rate": 0.0001,
"load_model": false,
"max_grad_norm": 40.0,
"momentum": 0,
"num_actor_devices": 4,
"num_actors": 100,
"num_buffers": 50,
"num_threads": 4,
"objective": "adp",
"save_interval": 30,
"savedir": "douzero_checkpoints",
"total_frames": 50000000000,
"training_device": "4",
"unroll_length": 100,
"xpid": "douzero"

当我增加num_actors数量时,训练的frames并没有很好的变化一直在差不多2000左右,希望您能帮我指正一下我存在的错误。 @daochenzha

XliOK commented

有联系方式吗,我能给你看看,V:_Linuxer

我也有同样的问题,有没有大佬支持一下?