training is very slow (>8h already) and has no sensible progress indicator
324705 opened this issue · 7 comments
In the README.md it says that training takes less than 3 hours. I let the pc work overnight for over 8 hours and it reached only epoch 6. Is there any way to reduce the training time (in exchange for worse accuracy)? Is there any chance you could provide the pre-trained mnist.model data?
Hi @324705
Have you set a logger instance so that you can monitor training progress? See train.php
for an example.
Until we solve the problem of multi threading, training of neural networks will be slow as they are quite heavy on computation
This is currently an active subject we are working on
Until we get there, you can play around with some hyper-parameters, specifically the learning rate that will allow the network to train faster, however, a rate too high might cause the network to fail to converge. Ex. Instead of 0.001 you may might try a learning rate of 0.005. Decreasing the size of the model will also speed up training at the cost of flexibility.
The idea of offering pretrained models is also a consideration on our radar
Thanks for the great question, let me know if I can help with anything else
Hi, I attach a log for your info, which is executed on AWS t3.xlarge (using Ubuntu 18.04 and PHP 7.3.9)
Elapsed time was about 6hrs as shown in the file.
Note: it is important to disable Xdebug.
Enabling Xdebug slows down the performance of PHP in general. It may be good if it is noted in FAQ. Would you add it?
From my experiments it took about 1hr/epoch (i.e. 2x-3x slower) if Xdebug is enabled.
there is no xdebug entry in my php.ini file. is there any other place i have to look? btw training took me about 10 hours with no adjustments to the original code.
Hi @324705,
If you don't have Xdebug, there is no problem at all.
Training time depends on machine specs and I think 10 hours is somewhat reasonable enough if you are using mid-range CPU, in comparison with t3.xlarge which has Xeon Platinum 8175M.
https://www.cpubenchmark.net/high_end_cpus.html
With the Tensor extension we are completing full epochs in 3 minutes
andrew@VOLLUTO:/mnt/c/Users/Andrew/Workspace/Rubix/MNIST$ php train.php
Loading data into memory ...
Training ...
[2020-02-04 06:42:07] MNIST.INFO: Fitted ImageVectorizer
[2020-02-04 06:42:16] MNIST.INFO: Fitted ZScaleStandardizer
[2020-02-04 06:42:20] MNIST.INFO: Learner init hidden_layers=[0=Dense 1=Activation 2=Dropout 3=Dense 4=Activation 5=Dropout 6=Dense 7=Activation 8=Dropout] batch_size=200 optimizer=Adam alpha=0.0001 epochs=1000 min_change=0.0001 window=3 hold_out=0.1 cost_fn=CrossEntropy metric=FBeta
[2020-02-04 06:45:12] MNIST.INFO: Epoch 1 score=0.94355236537826 loss=0.034297487074677
[2020-02-04 06:48:09] MNIST.INFO: Epoch 2 score=0.9568734780257 loss=0.016965537428612
[2020-02-04 06:50:59] MNIST.INFO: Epoch 3 score=0.96030954560626 loss=0.013330950531581
[2020-02-04 06:53:54] MNIST.INFO: Epoch 4 score=0.96091932388603 loss=0.01165716545718
[2020-02-04 06:56:54] MNIST.INFO: Epoch 5 score=0.96362291250936 loss=0.010479900830758
[2020-02-04 06:59:51] MNIST.INFO: Epoch 6 score=0.96605237416864 loss=0.0097338521787775
System is an i7 8650 with 16G of RAM running PHP 7.2.24