Early stopping in third phase of training
nfultz opened this issue · 1 comments
nfultz commented
My current data set had a trace is below. I believe that the fit would have been essentially identical if the training had ended a hundred iterations early.
So it could be very practical to also specify a stopping condition in terms of a minimum improvement (instead of just a fixed number of iterations), especially for use cases where the training function is called repeatedly for hyperparam tuning.
Initial Loss: 221937.015625
Iteration: 10, Loss: 2350144.000000
Iteration: 20, Loss: 402674.156250
Iteration: 30, Loss: 240955.015625
Iteration: 40, Loss: 197510.812500
Iteration: 50, Loss: 153939.875000
Iteration: 60, Loss: 129958.703125
Iteration: 70, Loss: 117012.429688
Iteration: 80, Loss: 107633.703125
Iteration: 90, Loss: 98910.828125
Iteration: 100, Loss: 88535.101562
Iteration: 110, Loss: 160596.593750
Iteration: 120, Loss: 146496.093750
Iteration: 130, Loss: 138674.703125
Iteration: 140, Loss: 134762.906250
Iteration: 150, Loss: 132901.375000
Iteration: 160, Loss: 132175.421875
Iteration: 170, Loss: 132131.562500
Iteration: 180, Loss: 132340.203125
Iteration: 190, Loss: 132734.750000
Iteration: 200, Loss: 133220.187500
Iteration: 210, Loss: 60164.875000
Iteration: 220, Loss: 54855.210938
Iteration: 230, Loss: 53705.199219
Iteration: 240, Loss: 53232.484375
Iteration: 250, Loss: 53050.156250
Iteration: 260, Loss: 52986.171875
Iteration: 270, Loss: 52963.117188
Iteration: 280, Loss: 52954.531250
Iteration: 290, Loss: 52950.292969
Iteration: 300, Loss: 52948.816406
Iteration: 310, Loss: 52948.343750
Iteration: 320, Loss: 52947.710938
Iteration: 330, Loss: 52947.429688
Iteration: 340, Loss: 52947.203125
Iteration: 350, Loss: 52947.214844
Iteration: 360, Loss: 52947.093750
Iteration: 370, Loss: 52946.875000
Iteration: 380, Loss: 52946.687500
Iteration: 390, Loss: 52946.578125
Iteration: 400, Loss: 52946.523438
Iteration: 410, Loss: 52946.421875
Iteration: 420, Loss: 52946.429688
Iteration: 430, Loss: 52946.238281
Iteration: 440, Loss: 52946.078125
Iteration: 450, Loss: 52945.914062
Elapsed time: 447.48s
CPU times: user 10min 29s, sys: 17.6 s, total: 10min 46s
Wall time: 7min 46s
hyhuang00 commented
This is an interesting idea. Given the way the loss has been constructed, it will take some time to find a good threshold for early stopping that works for most of the use cases. I think on most datasets, 350 iterations will be sufficient, so it could be used as a temporary workaround for your scenario.