chris-belcher/electrum-personal-server

EPS hangs on get_fee_histogram

ejose19 opened this issue ยท 40 comments

Since some days ago, my node is crashing at this everytime my electrum wallets connects to it:

{"jsonrpc": "2.0", "method": "mempool.get_fee_histogram", "id": 210}

Takes about 2 mins for a reply, which seems excesively high (running on raspberry pi 3b+ with bitcoin core and eps only)

The issue is that it takes 2m for the initial sync, but I can't send a transaction because that command hangs out the entire interface and then the broadcast timeout.

Any ideas what I can do to make this run faster?

The mempool has been growing quite a lot the past days. The default value can be quite heavy for a pi (if you use that). If you don't have this already, try adding this to your bitcoin configuration file:

dbcache=100
maxorphantx=10
maxmempool=50
maxconnections=40
maxuploadtarget=5000

(from https://github.com/Stadicus/guides/blob/master/raspibolt/raspibolt_70_troubleshooting.md)

Then restart bitcoin-cli. If you already have this, maybe try lowering maxmempool even more to rule out that as the problem.

Hi @HelgeHunding, yes I already have that config. That's what I feared, so a raspberry pi is ruled out for being a reliable eps node, as limiting even more the mempool could not show my pending unconfirmed with lower fees.

Anyways thanks for helping!

Thanks for the issue.

Limiting the mempool size might still result in your own transactions being shown, because Core might give preference to transactions in its own wallet.

I wonder if its possible to optimize the relevant code which calculates the fee histogram so that it runs faster. Another possible solution is to precalculate the fee histogram and cache it.

That seems like a good idea indeed, allowing the user to define the time in minutes so it fits the hardware it's running on. Can you tell me where is the fee histogram algorithm so I check what I can contribute?

It is here

elif method == "mempool.get_fee_histogram":
mempool = rpc.call("getrawmempool", [True])
#algorithm copied from the relevant place in ElectrumX
#https://github.com/kyuupichan/electrumx/blob/e92c9bd4861c1e35989ad2773d33e01219d33280/server/mempool.py
fee_hist = defaultdict(int)
for txid, details in mempool.items():
fee_rate = 1e8*details["fee"] // details["size"]
fee_hist[fee_rate] += details["size"]
l = list(reversed(sorted(fee_hist.items())))
out = []
size = 0
r = 0
binsize = 100000
for fee, s in l:
size += s
if size + r > binsize:
out.append((fee, size))
r += size - binsize
size = 0
binsize *= 1.1
result = out
send_response(sock, query, result)

Though you may already be doing this, here's another suggestion: If recent versions of Electrum are run with both --oneserver and --server arguments then it will also make their connection timeout be increased to about 2 minutes. Then Electrum might not disconnect while waiting for the reply.

Yes, I was doing that. Believe it or not, but the raspberry took even more than that when the mempool grew big starting this month. Maybe another solution is adding a flag to the eps server to ommit the get_fee_histogram? I'd guess users would prefer to lookup the mempool on Johoes site and set the fee manually than waiting 2m or even not using the wallet at all because of that function.

I did some playing around and it seems a much more significant bottleneck is the actual call to getrawmempool rather than the calculations in python. On my laptop which runs EPS it took about 60 seconds for the call to return and right now the mempool is only 15MB, so back a few days ago when the mempool was 50MB it would've been worse.

So I'm thinking a good solution is to create another thread which every 15 minutes (configurable) will call getrawmempool and calculate the histogram. Then the main network thread would remain responsive whatever happens.

After some more playing its clear the python-side calculation is completely negligible compared to the waiting time for calling getrawmempool.

Did some testing as well, with about 3MB of mempool the raspberry takes between 7 and 15s, while my other node (2 CPU 2GB) takes less than a second. So yes, the better solution would be to simply cache and run in a separate process. One question thought, it's possible to return to the electrum wallet an empty array for the histogram call? In case the user is not interested in that functionability (I already see high mempool giving troubles to raspberry pi even if it is running on another thread, because when that call was done the rpc was not working on another tty until the histogram command finished)

(I already see high mempool giving troubles to raspberry pi even if it is running on another thread, because when that call was done the rpc was not working on another tty until the histogram command finished)

After some discussion on #bitcoin and more experimenting, I find that this happens because Bitcoin Core uses a lock, not because of any CPU bottlenecks. So even if you used multiple threads the RPC calls would still happen serially.

So the call to getrawmempool can be done at periodic intervals and then cached to improve responsiveness, but there's no need to use other threads.

A possibly better way to do all this is to run getrawmempool with false so it only returns TXIDs, which is much faster, and then query each TXID with getmempoolentry and calculate the fee histogram that way. That would be slightly slower overall but could be made to avoid holding the lock for a long time.

Though I think now I'll code a feature where the fee histogram can be simply disabled, it can at least solve the pressing issue caused by the recent large mempool

Ok, did some testings with this script I made based on your suggestions

import time
from functools import reduce
from bitcoinrpc.authproxy import AuthServiceProxy, JSONRPCException

rpc_connection = AuthServiceProxy('http://rpcuser:rpcpass@127.0.0.1:8332')
times = {'method1': [], 'method2': [], 'method3': []}
mempoolsize = rpc_connection.getmempoolinfo()['bytes'] / 1000000

for x in range(5):
    # Test getrawmempool NV (just for reference)

    start1 = time.time()
    rawmp = rpc_connection.getrawmempool()
    end1 = time.time()
    result1 = end1-start1

    # Test getrawmempool verbose

    start2 = time.time()
    rawmp = rpc_connection.getrawmempool(True)
    end2 = time.time()
    result2 = end2-start2

    # Test getrawmempool NV querying every txid on getmempoolentry

    mempoolVArr = []
    start3 = time.time()
    rawmpv = rpc_connection.getrawmempool()

    for i in rawmpv:
        try:
            mempoolVArr.append(rpc_connection.getmempoolentry(i))
        except:
            pass

    end3 = time.time()
    result3 = end3-start3

    times['method1'].append(result1)
    times['method2'].append(result2)
    times['method3'].append(result3)

avgM1 = sum(times['method1']) / len(times['method1'])
avgM2 = sum(times['method2']) / len(times['method2'])
avgM3 = sum(times['method3']) / len(times['method3'])

print('\nMempool Size (MB): %s\n\nMethod 1: %s\nMethod 2: %s\nMethod 3: %s\n' % (mempoolsize, avgM1, avgM2, avgM3))

And here are the results:

2CPU 2GB Node:

Mempool Size (MB): 6.904045

Method 1: 0.024827337265014647
Method 2: 0.9257128238677979
Method 3: 4.905688762664795

Raspberry Pi 3B+:

Mempool Size (MB): 3.640026

Method 1: 0.2059253215789795
Method 2: 5.790736103057862
Method 3: 23.43268928527832

Interesting note:

I had to wrap getmempoolentry in a try except because I was receiving this error ocasionally (way more on the pi than the other node):

bitcoinrpc.authproxy.JSONRPCException: -5: Transaction not in mempool

Probably because between the getrawmempool and querying each txid with getmempoolentry some of them got included in a block and wiped from mempool

So based on those results I think current solution is the most efficient, as with that increase of 4-5x using getmempoolentry I doubt the rpc server will be free to use. Guess the best way is just allow the user to disable this functionability.

Another idea is to run getmempoolinfo first and check the size in bytes, and let the user define a limit on when to proceed with calculating (let's say up to 10MB proceed, else return empty array.

That way it's more flexible and does the work if it won't collapse (based on hardware)

I've written a feature for disabling the fee histogram feature 4a9f39d If you have time you should try whether it works for you.

Thanks for the research. The 4-5x increase factor is somewhat concerning, although those calculations can be spread out over a long time so the server should hopefully always remain responsive. Of course the user could always disable that feature if they're really low on CPU. Another thing a low-CPU user should do is increase the intevals poll_interval_listening and poll_interval_connected.

Tested and it's working as expected, on electrum wallet it just show 1sat/byte for all slider's positions.

If you say bitcoin core wallet gives preference to own user wallet txs on the mempool, then just limiting it to the appropriate amount is enough. I don't think "spreading" the calculation is a good idea, besides the method is 4-5 times longer, if you spread them many txs would be cleared by the time it finishes, thus making the calculation inaccurate.

Thanks.

There's an argument that even though it takes longer, it's worth it. The mempool feature is a good one and doesn't rely on third party websites or blockchain explorers. Even if a few transactions get mined the resulting fee histogram should still be mostly correct and useful. On the other hand, it takes some effort to code this, and people might have other priorities.

If you say bitcoin core wallet gives preference to own user wallet txs on the mempool, then just limiting it to the appropriate amount is enough

I'm 90% sure this is true, but it's always worth checking. I believe your own transactions end up in your own wallet.dat and not just the mempool, so your node will keep them around whatever happens.

I asked in #bitcoin and someone told me it won't keep the txns in mempool even if those are mine, mempool is sorted by weight. I will do some testings limiting the MP to say 2MB and sending some low fee txns.

I think (and hope) they'll appear in listtransactions, which is what EPS polls, even if they're not in your node's mempool. This is (hopefully) because listtransactions is an RPC call querying the wallet not the mempool.

In the same way how your own wallet's transactions appear in listtransactions even if you have pruning enabled and the containing block was deleted. But yes experimenting is the easiest way to know.

EDIT: I just asked in #bitcoin and harding says they definitely should appear. And if they don't appear it would be a bug. Also he suggests testing it using regtest with mempool set to 0.00001MB and creating a few txes.

After some playing, I found out that low fee own txns are not even broadcasted if the mempool is full in the node (test tx was 2sat/byte)

{"jsonrpc": "2.0", "result": "{'code': -26, 'message': 'mempool min fee not met, 282 < 14458 (code 66)'}", "id": 128}

I checked what was the cause (because I didn't change the default mempoolminfee config, just limited to 5MB max (minimum limit by core) and found this

bitcoin/bitcoin#11955 (comment)

on which he explains that when mempool is full, the minfee is adjusted to meet the current lowest (which after calling getmempoolinfo I got 0.00102111 / 102 sat/b so his reasoning is correct)

So based on all of this, seems latest pi is not reliable enough to run EPS server, as big grow in mempool will either: block user from sending txs if the fees are on roof and his own are lower than min in mempool / delaying or need to disable the fee histogram calculation (on which we agree is a useful function)

Currently the 2GB 2CPU node is doing excellent, I will report if that changes when the next mempool spike comes.

Yes that's what it sounds like. Thanks for helping.

Calculating the fee histogram in a cached way with getmempoolentry should be a solution.

Also issue #52 about broadcasting transactions via tor (and other methods such as SMS) is also an option as it would provide a way to broadcast even if the hosting full node had a big minfee.

Yes, of course broadcasting it using ANOTHER full node is always a solution, but that defeats the point.

Now if we're talking on improving EPS regardless on where it's run, that's correct. It's way more efficient to cache the fee histogram algorithm results and send back reading from there, with a configurable time to refresh.

Slight aside but broadcasting via not your own node doesn't defeat the point. There's a couple of privacy attacks based on tx broadcasting which is avoided by broadcasting another way.

Fun fact that ElectrumX also considers handling the mempool to be difficult.

https://github.com/kyuupichan/electrumx/blob/master/docs/features.rst

Minimal resource usage once caught up and serving clients; tracking the transaction mempool appears to be the most expensive part.

This patch speeds up getrawmempool by about 30% bitcoin/bitcoin#14984

are we experiencing another spike in mempool right now?

Looks like it, we're at 19 vMB as I write, although that's not as big as the 50 MB mempool which caused this issue to be created.

I was having the troubles described above with my pi3+ when I posted 10h ago. Is there now a way to disable the histogram?

for the bitcoin core patch above, how do I find out which version is it implemented in please?

oh and just saw your comment, thank you

Glad there's a new raspberry pi that shouldn't have issues with this for a while (4GB version)

I would like to install eps on my synology nas and have it connect to pi3+ core then should be ok. But I have no clue how to install it on synology. If anyone gives some advice I ll be very thankful.

This new feature of Core looks very useful for this issue: bitcoin/bitcoin#15836 It creates a fee rate histogram that would otherwise be inefficient to calculate outside of the codebase, so the new feature could solve exactly the problem we have here.

It would be good if we checked whether the format of the histogram output can be converted to the format that Electrum expects (for example, that there isn't some missing information we need).

Also, feeling the need to update on this. I decided to give the raspberry pi a try again (but this time with version 4b 4GB) and it has been running very good (with a peak of about 25MB in mempool).

Will keep updated whenever another >50MB peak occurs.

These past few days we had mempools spiking to 60-70 MB. How did people's nodes deal with that?

I just pushed a commit 136b957 which implements the getrawmempool false and getmempoolentry solution. It synchronizes the mempool once when the server first starts up, then afterwards only updates it every 60 seconds or so. That is much faster than synchronizing from nothing every time. And also it keeps the server responsive, never lagging more than a second or two. I think this solution should solve the problem.

Keeping this issue open in case the fee rate histogram ever gets added to Bitcoin Core.

I just pushed a commit 136b957 which implements the getrawmempool false and getmempoolentry solution. It synchronizes the mempool once when the server first starts up, then afterwards only updates it every 60 seconds or so. That is much faster than synchronizing from nothing every time. And also it keeps the server responsive, never lagging more than a second or two. I think this solution should solve the problem.

@chris-belcher are there any plans to release a version with this fix?

Yes yes, I'll release soon. I got caught up with something for joinmarket

New release is out with the new mempool code. Did anyone try it? Does it work?

Yep, I tried the new release, and it's working fine for now.