zawy12/difficulty-algorithms

EMA-Z Difficulty Algorithm

zawy12 opened this issue · 8 comments

[update: the e^x forms I described in this issue in 2018 are unnecessarily complicated and they give almost identical results to ASERT. The simpler way to view this content is that the primary equation presented:

next_target = prior_target * (1+t/T/N-1/N)
is just an EMA where t=solvetime of previous block, T = desired block time, N is a "filter", and t/T is the estimate of how much previous target needed to be adjusted. To show this is just an EMA, Wikipedia's EMA: is:

S[i] = A*Y[i] + (1-A)*S[i-1]
Our estimated next difficulty based on previous solvetime and previous difficulty is
Y[i] = S[i-1]*adjustment_needed
which gives:
target[i] = A*target[i-1]*t/T + (1-A)*target[i-1]
Using, A = α = 1/N, this simplifies to the 1st Target equation above.

We should have guessed (and JE did) this could be improved with replacing the implied 1+x in the 1st equation with e^x to get the later-discovered-as-possibly-ideal ASERT. Wikipedia says EMA is a discrete approximation to the exponential function. So this leads us to relative ASERT:
next_target = prior_target * e^(t/T/N-1/N)

In an upcoming issue (it is now January 2022 in case I do not come back to link to it) I'll argue adjustment_needed should not be t/T but be
adjustment_needed = 1+ error =1 + e^(-median_solvetime/T) - e^(-t/T) = 1.5 - e^(-t/T)
In words, this adjusts based on the probabilities of the solvetimes we expect. It is the ratio of the observed to the expected solvetime probability for individual solvetimes. We expect to see a median solvetime which for an exponential distribution is e^(-t/T) * T= 0.5*T, not the long term average. A simple t/T does not take into account probabilities of the t's we expect. So the proper EMA from this assumption is
target[i] = A*target[i-1]*(1.5 - e^(-t/T) ) + (1-A)*target[i-1]
which is
next_target = previous_target * (1 + 0.5/N - e^(-t/T)/N )
and if we guess this should be in an ASERT-like form we can guess this is
next_target = previous_target * e^(0.5/N - e^(-t/T))
This is just replacing the error signal "t/T -1" in ASERT with 0.5-e^(-t/T). This works lot a better than ASERT in the sense of getting the same stability as ASERT with 1/3 the value of N. It results in 1/2 as many "blocks stolen" compared to dedicated miners during on-off mining at a cost of very slightly longer delays and avg solvetime being longer during the attack.

end update

Summary: This explores the theoretical origins of EMA algorithms. The only sensible one to use that does not have problem due to integer math, negative difficulties or zero solvetimes is:

simplified EMA-TH (floating point for clarity)
next_target = previous_target * (1+t/T/N-1/N)
next_difficulty = previous_difficulty /  (1+t/T/N-1/N)
ETH does something like the above WITH the integer round off error and surprisingly does OK.

simplified EMA-TH (actual code for integer math with less chance of overflow problem in target)
k=10000;
next_target =(previous_target/ (k*N)) * (k*N+(k*t)/T-k) 
next_difficulty =( previous_difficulty *k* N )/(k*N+(k*t)/T-k)

Bit shifting can be used to replace the above division.

Notation:

  • t = a solvetime
  • T = target solvetime
  • N = stability factor for EMAs. N=10 for very fast, N=80 for slow and smooth difficulty. SMA's with N=144 are like EMAs with N=80. It might also be called "extinction coefficient" or "mean lifetime"

There are two different ideal forms of the EMA shown below that are equal within 0.02% for most solvetimes, and equal to within 0.005% on average. The differences between them all can be attributed to

e^x =~ 1+x  for small x which results in the simple forms
1+x =~ 1/(1-x) when x is small which is the distinction between JE and TH.

JE and TH refer to Jacob Eliosoff's original version, and TH refers to a precise form of Tom Harding's simple version . The simplified JE-ema can result in divide by zero or negatives for smallish N, or if a big miner keep sending "+1" second timestamps that ends with an honest timestamp that throws it negative. The more precise e^x and "accurate" JE-ema versions do not have this problem and are slightly better than TH versions at small N. But it is very difficult to get the e^x versions to not have error that cancels the benefits when working in integer math. The e^x needs to accurately handle values like e^(t/T/N) which for t=1, T=600, and N=100 is e^(0.000017). Even after using e^(1E6*t/T/N) methodology the avg solvetime the error from floating point that I could not resolve was 1% plus there was another 1% error at N=100 from being able to let t=0 in these versions due to divide by zero.

ema-JE = prev_target / ( T/t+(1-T/t)*e^(-t/T/N) )
ema-TH = prev_target * ( T/t+(1-T/t)*e^(+t/T/N) )

The bottom of this article shows how this comes from common EMA terminology about stock prices. After Jacob Eliosoff derived or maybe guessed this from experience with EMAs and some thought, I later found out his result has applications in measuring computer performance

A e^x = 1+x+x^2 substitution can be used that gets rid of the e^x version, but it introduces a much greater potential for overflow. It

k=t/T/N
accurate ema-JE = prev_target / (1+1/N+k*(k/2-1-1/2/N)) // ** The most accurate one **
// To do the above in integer math, the follow can be used but has large overflow potential:
accurate ema-JE = (prev_target  * (N*T*N*T*2)) / (2*T*N*T*N+2*T*T*N+ST*(ST-2*T*N-T))
// But a difficulty version does is not a problem i D is large:
accurate ema-JE = (prev_target  + prevD/N + (prevD*ST*ST)/(N*T*N*T*2) - prevD*ST/T/N -(prevD*ST)/(N*N*T*2)

accurate ema-TH = prev_target * (1-1/N+k*(k/2+1-1/2/N))

With the simpler substitution e^x = 1+x, another near-perfect algorithm is possible, as long as N is not smaller than about < 20. These can have zero and divide by zero problems

simplified ema-JE = prev_target * N / (N-t/T+1)   // Do not use this one.
simplified ema-TH = prev_target * (N+t/T-1) / N  //  This is the most sensible one. Simpler small error.
(just inverse for difficulty versions)

Problems

  1. The simplified JE version can result in divide by zero in the target version, or negative difficulty in either version for smallish N, or if miners keep sending "+1" second timestamps for a significant portion of N blocks. This can be used in a private mine to lower difficulty artificially low, getting unlimited blocks in finite time (if miner's hashrate is >50% to still win most-chain-work rule). But the more precise e^x JE version does not have this problem and it noticeably better than TH versions at small N. The accurate JE ema does not have this problem and should be sufficient.
  2. TH versions are not as accurate for small N.
  3. If there is an offset error "e" such that nextD = (1+e)*emaD then the error compounds so that the difficulty will be off by (1+e)^N. It might result from integer division or something else. It's something to be aware of that should be revealed in testing if it is a problem, assuming it can't be forced with an exploit.
  4. Can't be used in coins like Cryptonote clones which simply assign the completion time of the previous block as the timestamp of the current block. It results in substantial oscillations that could "blow up".

As PID Controller
The EMA might be viewable as a PID controller that takes the Poisson distribution of the solvetimes into account. A PID controller is a*P+b*I+c*D where P is present value, I is the integral (sum) of previous values, and D is the derivative (slope) of recent values. So a PID controller is taking present, past, and an estimate of the future into account. To see it possibly here, the ema-JE can be written in the form:
next_difficulty = prev_D*T/t*(1-e^(-t/T/N) + t/T*e^(-t/T/N) )
1-e^(-t/T) is the single-tail probability of a solvetime being less than t.
t/T*e^(-t/T) is the probability density function of solvetimes. It is also the derivative of the former.
The next_D (aka 1/(target hash)) keeps a running record of previous results, so in a sense it's like a summation. The summation can be seen if you recursively substitute prev_D with the equation that came before it, and expand terms. I can't see how that would lead to the PID equation, but I wanted to point out the 3 elements of a PID controller seem to be present. A similar statement can be made about the WHM. The SMA and Digishield seems to be like a PI controller since it does not take the slope into account.

The EMA can be the starting point of a fully functional PID Controller that does not work noticeably better than the above. This is because PID controllers are needed in systems that have any kind "mass" with "inertia" (which might be an economy or public opinion, not merely restricted to the physics of mass). The "inertia" leads to 2nd order differential equations that can be dealt with by PID controllers. Difficulty that needs to have faster adjustment as might be needed by a PID controller does not have any inertia. Long-term dedicated miners have inertia, but the price/difficulty motivation of big miners jumping on and off a coin (the source of a need for a controller) is a non-linear equation that invalidates the relevance of PID controllers.

I've been promoting this algorithm, but after further study, the
LWMA and Simple EMA are the best

Links to background information:
Original announcement of the algorithm
Handling bad timestamps
Determining optimal value of N
Other difficulty algorithms
All my previous inferior attempts

Note: there is a 2.3 factor to make the N of this algorithm match the speed of the WHM.

# EMA-Z Difficulty Algorithm 
# credits: Jacob Eliosoff, Tom Harding (Degenr8), Neil (kyuupichan), Scott Roberts (Zawy)
# Extensive research is behind this:
# Difficulty articles:  https://github.com/zawy12/difficulty-algorithms/issues
# https://github.com/kyuupichan/difficulty
# https://github.com/seredat/karbowanec/commit/231db5270acb2e673a641a1800be910ce345668a

# ST = solvetime, T=target solvetime, TT=adjusted target solvetime
# height - 1 = most recently solved block.  
# basic equation:  next_target=prev_target*(N+ST/T-1)/N

# Do not use MTP as the most recent block because shown method is a lot better.
# But if you do not MTP, you MUST allow negative solvetimes or an exploit is possible.

# Ideal N appears to be N= 20 to 43 for T=600 to 60 by the following formula.
# ( Note: N=20 here is like N=46 in an SMA, WHM, and the EMA-D )

# The following allows the N here to match all other articles that mention "N"
# An EMA with this N will match the WHM in response speed of the same N
N = int(45*(600/T)^0.3)
# But I need to convert that N for use in the algorithm. 
# M is the "mean life" or "1/(extinction coefficient)" used in e^(-t/M) equations.
M = int(N/2.3+0.5)

# WARNING: Do not use "if ST<1 then ST=1" or a 20% hashrate miner can
# cut your difficulty in half. See timestamp article link above.

ST = states[-1].timestamp - states[-2].timestamp;
prev_target = bits_to_target(states[-1].bits);
k = (1000*ST)/(T*M);
// integer math to prevent round off error
next_target = (1000*prev_target * (1000-1000/M+(k*k)/2000+k-k/(2*M)  )/1000)/1000;

hello,
I'd like to test your daa, but getting confused with these two lines:
ST = max(10T,min(-9T,ST))
how does -9 * t come? in cpp from my understanding, expr above always returns 10t, then why do we
need this?
prev_target * ((T
N)^2-T^2N+k(k/2+(TN)^2-T^2N/2) / (TN)^2
should the missing bracket be placed like below?
prev_target * ((T
N)^2-T^2N+k(k/2+(TN)^2-T^2N/2)) / (T*N)^2

Looking forward to your reply.

@zawy12 Loved your efforts here. Are you in San Francisco and would you be interested in giving a 1-hour presentation/talk on difficulty Algorithms?

I think for this level of awesome, we should pay you to fly out and come speak! I'll send you an email direct once I touch base with the rest of folks with a proposal.

Will there be some live streams? It will be awesome too if anyone interested can view your talks via live streams. @taariq

I should do a video that summarizes everything I've learned.

(weighted) averages tend to be sensitive to corrupt data, a problem weighted medians do not have. This should presumably give a measure of current hash rate that is less sensitive to manipulation. Have you run any experiments either with a linearly weighted moving median or an exponential moving median (if so, can you make my life easy and link the comparisons?)

Medians interject a delay that is proportional to the amount of protection. The problem is that delay is in every block instead of just the bad ones. There is very little manipulation that can be done in the newest l w m a algorithm thanks to the stricter limits. The best thing to do is to just accept all the time stamps as honest even if they go back in time. The honest timestamps immediately cancel the negatives of the bad time stamps. So the only manipulation that can occur for more than one block is only when the attacker has more than 50%. With a stricter limits maximum benefit he can get is about 5% lower difficulty for only as many blocks as his multiple of the Baseline difficulty. For example if he has 10 times the base line difficulty he has a 50% chance of getting 10 blocks at about 5% lower difficulty