facebookresearch/ELF

training MiniRTS is extremely slow

kirk86 opened this issue · 2 comments

Training the MiniRTS is extremely slow even in high end machines. I've noticed that it utilizes much more the cpu than the gpu. GPU utilization is considered negligible in this case since it fluctuates from as low as 0% to as high as 60% but never stable but cpu utilization on the other hand is at full capacity. This is strange since I've included the gpu flag during training. It's been the 4rth day today of continues running/training:

[trainer] actor count: 4750/5000
new_record: False
count: 574
best_win_rate: 0.7483104707675473
str_win_rate: [574] Win rate: 0.737 [25726/9197/34923], Best win rate: 0.748 [572]
str_acc_win_rate: Accumulated win rate: 0.688 [13556302/6158413/19714715]
 61%|█████▌   | 3057/5000 [05:31<03:30,  9.22it/s]

I would suggest someone have a look as the training time and waste of resources since it's not utilizing the gpu at full capacity even during training. The other thing is the easy of usage directly from python from which we can create easily different scenarios and environments for training and testing.

Not sure what is the configuration of your machine? What is the command line?
It is quite CPU-heavy because running games require that.

@yuandong-tian thanks for the reply. The machine is super good for this kind of jobs. Two Xeons, 4 gpus I think are more than enough for this kind of task to be quite fast. Nevertheless I still see training times close to 3 days and then I just had to kill the process. The command used to run the jobs are exactly the same as those in mentioned in the docs in the readme.