JOSS review: how to choose deltatol and niter?
Closed this issue · 2 comments
I was playing around with the stochastic quadratic function, and noticed that if deltatol
is smaller than the variance of the added noise in the objective, then the algorithm will never converge (since even at the optimum, the function values are fluctuating too much). In general, we don't know what the variance in our objective is, so how should one go about choosing the deltatol
parameter (or analogously, for minimizeSPSA
, the niter
parameter)? If it is too big, the algorithm exits prematurely, while if it is too small, it never returns. I don't know if you have tips on how to get around this problem, but adding some discussion in the documentation about the importance of deltatol
might be helpful.
You are right about the non-convergence if the function differences at distances of deltatol
are smaller than the stochasticity. This problem is resolved if errorcontrol=True
, but optimization still becomes exceedingly expensive if the variation of the function at the target accuracy is much smaller than the stochasticity. I added a comment about this to the docstring, see 0eba0e0.
deltatol
is the target pattern size, i.e. it determines how precisely the optimum is determined. It should be chosen as large as possible. I have added a description of the termination criteria to the docstring for more clarity, see 9599387.
For the SPSA algorithm there is some discussion for suitable parameter choices in the literature (see http://www.jhuapl.edu/SPSA/PDF-SPSA/Spall_Implementation_of_the_Simultaneous.PDF), which I have now referenced in the docstring, see d3ce9bc
Ah, interesting. Thanks for the references!