YingfanWang/PaCMAP

Early stopping in third phase of training

nfultz opened this issue · 1 comments

My current data set had a trace is below. I believe that the fit would have been essentially identical if the training had ended a hundred iterations early.

So it could be very practical to also specify a stopping condition in terms of a minimum improvement (instead of just a fixed number of iterations), especially for use cases where the training function is called repeatedly for hyperparam tuning.

Initial Loss: 221937.015625
Iteration:   10, Loss: 2350144.000000
Iteration:   20, Loss: 402674.156250
Iteration:   30, Loss: 240955.015625
Iteration:   40, Loss: 197510.812500
Iteration:   50, Loss: 153939.875000
Iteration:   60, Loss: 129958.703125
Iteration:   70, Loss: 117012.429688
Iteration:   80, Loss: 107633.703125
Iteration:   90, Loss: 98910.828125
Iteration:  100, Loss: 88535.101562
Iteration:  110, Loss: 160596.593750
Iteration:  120, Loss: 146496.093750
Iteration:  130, Loss: 138674.703125
Iteration:  140, Loss: 134762.906250
Iteration:  150, Loss: 132901.375000
Iteration:  160, Loss: 132175.421875
Iteration:  170, Loss: 132131.562500
Iteration:  180, Loss: 132340.203125
Iteration:  190, Loss: 132734.750000
Iteration:  200, Loss: 133220.187500
Iteration:  210, Loss: 60164.875000
Iteration:  220, Loss: 54855.210938
Iteration:  230, Loss: 53705.199219
Iteration:  240, Loss: 53232.484375
Iteration:  250, Loss: 53050.156250
Iteration:  260, Loss: 52986.171875
Iteration:  270, Loss: 52963.117188
Iteration:  280, Loss: 52954.531250
Iteration:  290, Loss: 52950.292969
Iteration:  300, Loss: 52948.816406
Iteration:  310, Loss: 52948.343750
Iteration:  320, Loss: 52947.710938
Iteration:  330, Loss: 52947.429688
Iteration:  340, Loss: 52947.203125
Iteration:  350, Loss: 52947.214844
Iteration:  360, Loss: 52947.093750
Iteration:  370, Loss: 52946.875000
Iteration:  380, Loss: 52946.687500
Iteration:  390, Loss: 52946.578125
Iteration:  400, Loss: 52946.523438
Iteration:  410, Loss: 52946.421875
Iteration:  420, Loss: 52946.429688
Iteration:  430, Loss: 52946.238281
Iteration:  440, Loss: 52946.078125
Iteration:  450, Loss: 52945.914062
Elapsed time: 447.48s
CPU times: user 10min 29s, sys: 17.6 s, total: 10min 46s
Wall time: 7min 46s

This is an interesting idea. Given the way the loss has been constructed, it will take some time to find a good threshold for early stopping that works for most of the use cases. I think on most datasets, 350 iterations will be sufficient, so it could be used as a temporary workaround for your scenario.