Make `diagonal=True` smarter?
Opened this issue · 8 comments
From discussion with @mdhaber
It looks like step[0] is a relative (to the value of x) initial step size and step[1] is a reduction factor for each iteration. However, when diagonal=True, step[0] effectively becomes an absolute step due to the way the wrapper function works.
@HDembinski HDembinski Jun 26, 2023
Yes, that's a speed trade-off. I am assuming here that the x-values are roughly of equal scale and don't vary a lot in magnitude. If they do, then diagonal=True should not be used. This needs to be properly documented at the very least.
The ideal solution in my view would be an algorithm that first groups x-values of similar magnitude together in blocks and then does the calculation using the same absolute step for those blocks. The speed of jacobi comes from doing array calculations as much as possible. Such an algorithm would give the same result as diagonal=True in the ideal case and would fall back to the slow "one-x value at a time" worst case if necessary.
A related idea is to always use a forward derivative for the smallest element of the x array and backward derivative for the largest element in automatic mode method = -1
. This is based on the heuristic that if the function has a compact domain, the x arrays typically cover that domain. For other poles inside the domain, we need to check the consistency of forward and backward derivatives as usual.
Re: #37 (comment), it depends on the intent of relative x
. IIUC, they should only be helpful for very large x
, when you are losing precision in the abscissae at which you evaluate the function. (E.g. if x = 1e20
, you can't take a step smaller then ~10k.)
But to avoid error due to truncation of the Taylor series, the magnitude of x
has no effect on the required step size, right? If we shift a function and shift the point at which we're taking the derivative accordingly, we still need the same step size to get the same accuracy (for a given order).
Re: #37 (comment) , that makes sense, but then you lose speed when one-sided differences aren't needed at all. I like the current approach - start with central difference, see whether function values are NaN, and if so, use one-sided. The difference is that it shouldn't be an all or nothing switch, using the same method for all elements of x
. You would look at where the NaNs occur. I'll show what I mean in mdhaber/scipy#106.
But to avoid error due to truncation of the Taylor series, the magnitude of
x
has no effect on the required step size, right? If we shift a function and shift the point at which we're taking the derivative accordingly, we still need the same step size to get the same accuracy (for a given order).
The step must be both smaller than x and like you say it should not be too small. Using a constant step size is fine if the x values are of same order of magnitude, but people may also use an x array created with numpy.geomspace so that it spans over many orders of magnitude.
I like the current approach - start with central difference, see whether function values are NaN, and if so, use one-sided.
Me, too, but when you mentioned that you want to use this, for example, with a scipy pdf, you may not necessarily get NaN for the central difference result. If the pdf just returns 0 outside of the domain, then applying the central difference formula on the edge point will not give NaN but some finite value.
In other words, just checking for NaN is probably not enough. If we keep the current scheme of computing central differences first and only do something different if that fails, we should also check whether the central difference is close to the average of the forward and backward difference estimates.
Ah, right. Personally, i think it is still enough for automatic detection, though. You are making an effort to detect an obvious case where central difference is invalid, but you can't easily detect all cases, so the line needs to be drawn somewhere. How about allowing method
to be an array?
I suppose it is a matter of philosophy - how you want to balance speed and ease of use (in extreme cases).
numpy.geomspace so that it spans over many orders of magnitude.
I see.
Instead of making the step linearly proportional to the magnitude of x
, we could at least have a dead band, maybe for abs(x) < 1e10
or so, where the step size is not adjusted based on the magnitude of the input. Beyond that, the step could be linear (affine, really) with the magnitude of x
. Another thought is that the actual step sizes taken should be measured, since (x+h)-x
does not necessarily equal h
.
A separate issue is subtractive cancellation of the output. We can measure the subtractive loss of precision by comparing the magnitude of the difference between function values to the magnitude of the function value, right? That can be detected in the termination condition, and it can either be considered in the reported error or trigger some corrective action.
Another thought is that the actual step sizes taken should be measured, since (x+h)-x does not necessarily equal h.
I know this trick and rejected this, because computing a better h from (x + h) - x is expensive. We already achieve machine-accuracy without this trick. The small error introduced by not using an h that is exactly representable is overcompensated by the polynomial extrapolation.