train without activity prediction, Colab Running Time

Question

train without activity prediction, Colab Running Time

Closed this issue 3 years ago · 8 comments

Hi,

I'm trying to train Next without activity prediction module to make a fair comparison with other models that only output trajectories. Does it only need to remove the arguments --add_activity while training?

Thanks a lot!

Answer 1 · 2021-04-22T17:47:20.000Z

Yes. For both training and testing.

Answer 2 · 2021-04-23T12:07:54.000Z

Hi, thanks for your response.
And may I ask how long it took you to train it on your local machine? I am training on colab pro and it showed me an incredible estimated time (thousands of hours....). Since pro provides v100 GPU, I suspect that it takes too much time on I/O on google cloud.

Answer 3 · 2021-04-23T13:41:12.000Z

I have not tried running it on Colab. With a 1080 TI GPU and an i5-core CPU, it would take about 36 hours to train with default settings. I/O should not be the bottleneck, since all data is packed into a .npz file, so in theory, everything needed would be loaded into RAM at the beginning. Could it be that RAM is not enough? Supposedly 24GB should be enough.

Answer 4 · 2021-04-24T15:10:49.000Z

Hi, I am using 25G RAM and it's helpful but still has thousands of hours estimated. I have attached a screenshot that contains some initial outputs. Is it correct? I don't think there will be any differences after just preprocessing.

Answer 5 · 2021-04-25T01:21:19.000Z

The RAM might be limiting the performance. I cannot find the problem with these. Try looking at what is the CPU/RAM/GPU usage during the run.

Answer 6 · 2021-04-25T08:41:12.000Z

It seems that it's getting the right estimated time after I left it training for hours. But still thank you so much for your help!

Answer 7 · 2021-04-25T14:53:52.000Z

Hi, just a follow-up question. When I tried to restore the training using --load, it still starts from global_step == 0.
I saw the code that it seems reset everything even if we want to restore. It should import the .meta files first, right?

Answer 8 · 2021-04-25T16:42:58.000Z

The code will restore from the latest checkpoint in the model path. global_step==0 means the step count is for this run. It does not reset anything.