RubixML/MNIST

training is very slow (>8h already) and has no sensible progress indicator

324705 opened this issue · 7 comments

In the README.md it says that training takes less than 3 hours. I let the pc work overnight for over 8 hours and it reached only epoch 6. Is there any way to reduce the training time (in exchange for worse accuracy)? Is there any chance you could provide the pre-trained mnist.model data?

Hi @324705

Have you set a logger instance so that you can monitor training progress? See train.php for an example.

Until we solve the problem of multi threading, training of neural networks will be slow as they are quite heavy on computation

This is currently an active subject we are working on

Until we get there, you can play around with some hyper-parameters, specifically the learning rate that will allow the network to train faster, however, a rate too high might cause the network to fail to converge. Ex. Instead of 0.001 you may might try a learning rate of 0.005. Decreasing the size of the model will also speed up training at the cost of flexibility.

The idea of offering pretrained models is also a consideration on our radar

Thanks for the great question, let me know if I can help with anything else

y-uti commented

Hi, I attach a log for your info, which is executed on AWS t3.xlarge (using Ubuntu 18.04 and PHP 7.3.9)
Elapsed time was about 6hrs as shown in the file.

Note: it is important to disable Xdebug.

train.log

That output looks consistent with what I've been getting with those settings @y-uti

The network usually settles on the model parameters around a 0.97 F1 score

You bring up a good point about disabling Xdebug - would you like to add something about that to our FAQ or should I?

y-uti commented

Enabling Xdebug slows down the performance of PHP in general. It may be good if it is noted in FAQ. Would you add it?

From my experiments it took about 1hr/epoch (i.e. 2x-3x slower) if Xdebug is enabled.

there is no xdebug entry in my php.ini file. is there any other place i have to look? btw training took me about 10 hours with no adjustments to the original code.

y-uti commented

Hi @324705,
If you don't have Xdebug, there is no problem at all.
Training time depends on machine specs and I think 10 hours is somewhat reasonable enough if you are using mid-range CPU, in comparison with t3.xlarge which has Xeon Platinum 8175M.
https://www.cpubenchmark.net/high_end_cpus.html

With the Tensor extension we are completing full epochs in 3 minutes

andrew@VOLLUTO:/mnt/c/Users/Andrew/Workspace/Rubix/MNIST$ php train.php
Loading data into memory ...
Training ...
[2020-02-04 06:42:07] MNIST.INFO: Fitted ImageVectorizer
[2020-02-04 06:42:16] MNIST.INFO: Fitted ZScaleStandardizer
[2020-02-04 06:42:20] MNIST.INFO: Learner init hidden_layers=[0=Dense 1=Activation 2=Dropout 3=Dense 4=Activation 5=Dropout 6=Dense 7=Activation 8=Dropout] batch_size=200 optimizer=Adam alpha=0.0001 epochs=1000 min_change=0.0001 window=3 hold_out=0.1 cost_fn=CrossEntropy metric=FBeta
[2020-02-04 06:45:12] MNIST.INFO: Epoch 1 score=0.94355236537826 loss=0.034297487074677
[2020-02-04 06:48:09] MNIST.INFO: Epoch 2 score=0.9568734780257 loss=0.016965537428612
[2020-02-04 06:50:59] MNIST.INFO: Epoch 3 score=0.96030954560626 loss=0.013330950531581
[2020-02-04 06:53:54] MNIST.INFO: Epoch 4 score=0.96091932388603 loss=0.01165716545718
[2020-02-04 06:56:54] MNIST.INFO: Epoch 5 score=0.96362291250936 loss=0.010479900830758
[2020-02-04 06:59:51] MNIST.INFO: Epoch 6 score=0.96605237416864 loss=0.0097338521787775

System is an i7 8650 with 16G of RAM running PHP 7.2.24