Confirmation time, long averaging windows, a history
zawy12 opened this issue · 0 comments
@jtoomim has put a lot of work into difficulty algorithms this year and brought 2 "discoveries" to my way of thinking. The first is that the delay to confirmation for txs is not the same as solvetime delays. Average tx confirmation time is
avg confirmation time = sum { sti * (sti+1)/2 } / sum { sti }
where sti = solvetime for block i. The math comes from assuming 1 tx per second and using the sum of a series of integers. For different DA's experiencing different changes in hashrate, the st's will have different distributions and keep a good average st = T, but this sum for determining avg confirmation time can vary a lot (See Jonathan's comments in my previous issue).
If every st is T seconds, then the above sum is (T+1)/2 instead of T. The exponential distribution (histogram) of st's obeying rand(0 to 1) = T * e^(-st/T) is what causes the result to be T (if hashrate isn't changing and the DA is very stable). In this case, the T result is due to the memory-less property of hashing which means that if you send a tx at a random point in time, you have to wait the average block time. Memory-less works forwards and backwards in time, so the most recent block before your tx will be also be 1 block time in the past. This means the average time between blocks if you choose a random point in time is 2x the block time. To prove the avg confirmation st is T (or T+0.5) when hashrate isn't changing and the DA is very stable, use this equation to generate exponentially-random st's
sti = -ln(xi) * 2^256 / target / hashrate
where x = a random value from 0 to 1. If the target is correct for the HR to get the correct T, then this is
sti = -ln(xi) * T
Using the avg confirmation time equation above (and noting log() = ln() below) this gives
This is a tricky way of avoiding the expectation integral on a PDF that I can't derive. It's like doing the math directly on the atoms instead of trying to use incomprehensible macroscale thermodynamics.
Now that that's out of the way
The 2nd "discovery" for me is that a really slow and steady difficulty can work without fear of getting stuck. I knew my old LWMA N=60 (like ASERT-21 where 21 is half-life) was too fast and wanted coins to go to N=120 or maybe higher. But Jonathan is talking about windows 14x longer than LWMA N=60. So far I can't find an error in his testing. My testing is giving the same results.
In the beginning ... Cryptonote/Monero/Bytecoin forks for small coins were a disaster (they would eventually get stuck) because of the CN algorithm was almost as bad as trying to use BTC's difficulty algorithm. The algorithm is an SMA with a 1-day averaging period. Coins eventually got stuck unless they were really big. Stellite chart below is a prime example the problem CN coins were facing. The magenta is line is hashrate and the blue are solvetime delays. (the stolen and delay metrics are my old ones that are partly subjective)
Graft is another good example
And Karbowanec
The above charts should not be confused with BCH's oscillations. The black lines are the difficulty, not the hashrate oscillations myself and others have shown for BCH. BCH's difficulty is varying only +/- 5% whereas the above are varying > 500%. There are two reasons for the difference despite both being 1-day SMAs. The first is that the CN algorithm has a cut and lag on solvetimes that ignores about 10% of the most recent blocks in the 1-day averaging. This significantly or greatly increases the oscillations in addition to being an SMA. The 2nd difference is that BCH is a more liquid coin with a very stable BTC/BCH exchange ratio (compared to these microcoins). This could change. BCH is seeing 10x changes in hashrate (HR), so it's not clear that these small coins were also suffering from being so small. I can only confirm 20x HR changes sometimes occurred, so they did not seem that much different in this respect.
BCH's situation is currently very close to the small coins who always got stuck. The only reason to think it might be safer is the belief that CN's cut and lag was having a large effect.
Getting back to history, I did not originally know the cut and lag were having a big effect in the CN coins. I argued for a simple SMA N=17 ("Zawy v1") in an issue somewhere that Karbowanec coin saw in late 2016 and used it to fix their problem. I didn't like Digishield because I didn't understand it, but it would have probably worked a lot better because it's not so unstable. Sumokoin picked up on the fix, then Forknote (which helps to create CN clones) adopted it. This undeserved popularity over Digishield lead to @Mengerian looking into it for BCH. This may have resulted in BCH using SMA N=144. It's simpler than Digishield. I argued for N=50 at the time which very probably would have been a lot worse. But no N=17 coin got stuck. Is that only because it does not have the cut and lag or because it recovers quickly? Would a stuck coin never occur if there is no cut and lag? Jacob Eliosoff, Tom Harding, and Kyuupichan all knew from testing before the BCH fork that the SMA could "cause" oscillations. I did not know or believe SMA's "cause" oscillations. I thought slow averaging periods could make the oscillations too big. (see my pig-headed footnote: SMAs do not cause oscillations)
Here's live data for "Zawy v1" an SMA with N=17. This chart is from Karbowanec and just a few days after the chart above for Karbowanec with the CN algo. The y-axis is normalized difficulty and block times are 4 minutes so > 2x difficulty 5x a day was the norm. Compare the stolen and delay metrics to what Karbo was seeing just before this picture was taken and you can see why it was getting popular with CN coins. (block index is not height)
Sumokoin and other CN/Forknote coins saw the same thing: (BTW Dash's first "Dark Gravity Wave" is an SMA N=24 so there should be plenty of coins with something nearly as impressive in the swings)
It was not always smooth sailing. Problem started occurring in Karbowanec.
How would a coin like BCH have done? It seems reasonable that although no Zawy v1 came anywhere near to getting stuck, it could happen in BCH because it's so liquid and so many miners are sensitive to it's reward/difficulty ratio compared to BTC. If the difficulty doubles, why is anyone going to mine BCH? It's not just theory that different groups of miners (even using the same POW) can act very differently as a later chart shows.
The switch to LWMA N=60 was nice This pretty much how ASERT with half-life of 21 would have done. Most of the time Digishield looks the same. This is very typical of LWMA coins.
Now I have to switch the style of charts. Blue areas are slow blocks. Blue dots are >7x delays in a single block. Red areas are sudden increases in HR (fast blocks). The following is another typical LWMA N=60 with 1 day of results. This is Zelcash who used to use Digishield.
Bitcoin Gold has been one of the best performing LWMAs and it's only N=45. I told them this because the others are 2 to 4 minute blocks and they are 10 minute blocks. The standard deviation 0.09 in difficulty is actually lower than almost all the N=60 LWMAs and even lower than if they had constant hashrate. The reason is probably because they are more liquid so that miners are more sensitive to difficulty changes and thereby cancel the natural oscillations. This is almost identical to ASERT-16.
But LWMA N=60 has not been without problems. About 10% of coins had an ecosystem of miners that were a real hassle. This is one of the milder instances.
LWMA-2 and 3 tried to address the issue by suddenly jumping 7% per block if past 2 or 3 blocks were fast and immediately going back down to the pre-attack difficulty if there was a slow block. So attacks only got 2 blocks cheaply. The FTLs are tight, 1 to 6 minutes. Not tight enough to stop manipulation to get a few extra blocks but I never saw manipulation. These are all the same algorithm, LWMA-3 N=60 which shows how different the mining communities can be.
I should point out there was an accidental LWMA N=17 because he switched from Zawy v1.
And an accidental LWMA N=960 (a like ASERT 700, but not as good) because he switched from CN's algo without changing a the constant. All these new charts are 1 day averages, mostly with 720 blocks/day. This is 960 blocks because it is T=90 seconds.
Finally I'm getting to where I wanted this article to go. The above accident was one of several hints that maybe N=60 was too fast. It's a really small coin and it has done as well as most LWMA N=60. It's nice but not a clear win. For comparison here's BCH N=144 when it did not have any problems.
Other largish coins with 1-day SMAs windows generally do better than LWMA N=60 (as fast as a 1-hour SMA). A key seems to be they are not using CN's cut and lag. These algos are basically SMA (BRNDF and DGW with N=180 instead of 24)
But BRNDF can be a real problem:
So finally last year I got on board with longer averaging windows although still not near as long as what Jonathan is recommending. I can't find an error in his reasoning or testing but they make me uneasy. I have likely been too biased due the CN coins having a cut and lag who's consequences I was not aware of. Anyway, here's a coin that had persistent problems with LWMA N=60 that got a lot better with N=
LWMA N=60
After switching to LWMA N=144 (almost identical to ASERT-50)
Other longer-window LWMA's seemed to good, probably better than the average N=60. Here's N=120 with LWMA-3. Compare the group of LWMA-3's above. You can see the single-block spikes from it jumping.
LWMA-4 N=90. It was a milder form of LWMA-3, mostly like LWMA-1. You can see the little jumps.
[ I'll have to stop writing here for now. I will probably come back later today. ]
Footnote: SMA's do not cause oscillations
I can understand if someone thinks the following is being pig-headed. But I love physics and electrical signals (I'm an EE) and I cringe whenever I hear someone say "SMA's cause oscillations". SMAs don't over-estimate the future HR like PID controllers that can cause oscillations. SMA is a fair and accurate estimate of past HR. That can't cause an oscillation. SMAs are like a vibrating rod. If something strikes it, it can ring. It can even have harmonics. The rod does not cause the oscillations, it just doesn't dampen them out. All other difficulty algos dampen the oscillations by giving more weight to more recent blocks. A rod is symmetrical on both ends for ringing. A rod secure on 1 end has a harder time ringing. Once anything like an exchange rate or fees increase or a large miner's whim sends an SMA difficulty high and then the change reverts in the next < 144 blocks, then when that initial time rolls out the back of the window, difficulty drops too low, inviting more miners back again despite the exchange rate not being as favorable as the temporary change. This causes the ringing to continue, but it can die out. The "invitation" is the cause and it was caused by the combination of the SMA and the external change that was "forced" on the SMA "rod". An SMA can accidentally go low which is an invitation to miners to come mine, sending it into oscillations. But an "invitation" is not a force or energy, but a signal. The energy that starts the vibrations come from miner motivation which is a force.