JoinMarket-Org/joinmarket-clientserver

Cache transaction data in wallet to speed up `history` method of `wallet-tool.py`

kristapsk opened this issue · 10 comments

Currently it is really slow. Likely some optimizations are possible even without adding new data to wallet file, but ultimate solution IMHO is caching.

For wallet with around 1000 transactions in history, it takes for me around 20 minutes to execute (was even slower before #1594).

If we get this performant enough this way, it would allow to solve #420, #1358 and joinmarket-webui/jam#515.

Caching itself should be done when UTXOs are added to wallet / removed from wallet.

Related - #1349.

First attempt (on a medium used wallet, probably ~ two hundred txs) gave a moderate speedup.
My guess is that there is some other redundant calculation here, but it'll take some investigating.

real 1m23.735s
user 1m4.008s
sys 0m0.864s

new:
real 0m49.681s
user 0m25.842s
sys 0m1.018s

Patch:

From a28b6335320c5456160786862189d3a9c432a222 Mon Sep 17 00:00:00 2001
From: Adam Gibson <ekaggata@gmail.com>
Date: Fri, 29 Dec 2023 12:46:07 -0600
Subject: [PATCH] trying an optimization of CMutableTransaction calls

---
 src/jmclient/wallet_utils.py | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/jmclient/wallet_utils.py b/src/jmclient/wallet_utils.py
index 43210e1..5c2afe4 100644
--- a/src/jmclient/wallet_utils.py
+++ b/src/jmclient/wallet_utils.py
@@ -379,14 +379,14 @@ def get_tx_info(txid, tx_cache=None):
         txd: deserialized transaction object (hex-encoded data)
     """
     if tx_cache is not None and txid in tx_cache:
-        rpctx = tx_cache[txid]
+        rpctx, rpctx_deser = tx_cache[txid]
     else:
         rpctx = jm_single().bc_interface.get_transaction(txid)
         if tx_cache is not None:
-            tx_cache[txid] = rpctx
-    txhex = str(rpctx['hex'])
-    tx = btc.CMutableTransaction.deserialize(hextobin(txhex))
-    output_script_values = {x.scriptPubKey: x.nValue for x in tx.vout}
+            txhex = str(rpctx['hex'])
+            rpctx_deser = btc.CMutableTransaction.deserialize(hextobin(txhex))
+            tx_cache[txid] = (rpctx, rpctx_deser)
+    output_script_values = {x.scriptPubKey: x.nValue for x in rpctx_deser.vout}
     value_freq_list = sorted(
         Counter(output_script_values.values()).most_common(),
         key=lambda x: -x[1])
@@ -398,7 +398,7 @@ def get_tx_info(txid, tx_cache=None):
     cj_amount = value_freq_list[0][0]
     cj_n = value_freq_list[0][1]
     return is_coinjoin, cj_amount, cj_n, output_script_values,\
-        rpctx.get('blocktime', 0), tx
+        rpctx.get('blocktime', 0), rpctx_deser
 
 
 def get_imported_privkey_branch(wallet_service, m, showprivkey):
@@ -846,7 +846,6 @@ def wallet_fetch_history(wallet, options):
                 'blocktme' not in tx)
         tx_db.executemany('INSERT INTO transactions VALUES(?, ?, ?, ?);',
                 uc_tx_data)
-
     txes = tx_db.execute(
         'SELECT DISTINCT txid, blockhash, blocktime '
         'FROM transactions '
@@ -903,22 +902,23 @@ def wallet_fetch_history(wallet, options):
 
         our_output_scripts = wallet_script_set.intersection(
             output_script_values.keys())
-
         rpc_inputs = []
         for ins in txd.vin:
             if ins.prevout.hash[::-1] in tx_cache:
-                wallet_tx = tx_cache[ins.prevout.hash[::-1]]
+                wallet_tx, wallet_tx_deser = tx_cache[ins.prevout.hash[::-1]]
             else:
                 wallet_tx = jm_single().bc_interface.get_transaction(
                     ins.prevout.hash[::-1])
-                tx_cache[ins.prevout.hash[::-1]] = wallet_tx
+                if wallet_tx:
+                    wallet_tx_deser = btc.CMutableTransaction.deserialize(
+                        hextobin(wallet_tx['hex']))
+                    tx_cache[ins.prevout.hash[::-1]] = (wallet_tx,
+                                                        wallet_tx_deser)
             if wallet_tx is None:
                 continue
-            inp = btc.CMutableTransaction.deserialize(hextobin(
-                wallet_tx['hex'])).vout[ins.prevout.n]
+            inp = wallet_tx_deser.vout[ins.prevout.n]
             input_dict = {"script": inp.scriptPubKey, "value": inp.nValue}
             rpc_inputs.append(input_dict)
-
         rpc_input_scripts = set(ind['script'] for ind in rpc_inputs)
         our_input_scripts = wallet_script_set.intersection(rpc_input_scripts)
         our_input_values = [
-- 
2.25.1

@AdamISZ Thanks, will try with around 1000 tx wallet.

@AdamISZ Your patch seems great, for my ~1000 tx wallet run time of wallet-tool.py history -v 4 changed from 18m43s - 18m58 to 5m5 - 5m22 (tested 3 times without and 3 times with this patch). It's not enough for big wallets, but is already great improvement, I think you should open a PR.

18m43s - 18m58 to 5m5 - 5m22 (tested 3 times without and 3 times with this patch).

Nice, not bad.

I think you should open a PR.

Sure, will do. (edit: see #1625 ).

Re: getting it genuinely not slow: maybe the problem here might be just that there is no batch gettransaction call, like we have for listtransactions. Maybe having to make this call once per tx is a big part of the issue. To avoid that, I'm really not sure, logically the "real" solution might be to store all the metadata (edit: I guess that could just be: the full transaction object which the specific utxo relates to) along with the utxos in the utxo manager, so that we can do the logical operations directly without rpc calls. But that is all just hypothesising, this might not be the issue at all.

logically the "real" solution might be to store all the metadata (edit: I guess that could just be: the full transaction object which the specific utxo relates to) along with the utxos in the utxo manager, so that we can do the logical operations directly without rpc calls.

That is also best idea I have right now, save gettransaction data and calculated tx_type. Maker code could do that on tx confirmation. But that doesn't work with pure taker mode. Or maybe it is enough to cache gettransaction data even before tx is confirmed? But then can end up with some unnecessary data in wallet file if transaction is replaced.

Went through the code and looked what information from gettransaction is actually used and it isn't too much actually. Some of it can be cached as is, for some we need to make some changes. Used stuff are:

  • hex (raw transaction hex, for storing we should do hextobin() to save space),
  • confirmations (can't store this one, should store blockheight instead and calculate number of confirmations dynamically),
  • details (but only label field is checked),
  • blockhash.

How I see it - get_transaction() should be modified to not return everything Bitcoin RPC returns, but only this stuff we use and cache, then some changes here and there (like handling number of confirmations) and then add caching to wallet.

This will by default only work and give speedup for new UTXOs. Thinking that some script or wallet-tool method should be added that allows to rebuild wallet cache.

Alternative approach is to do something similar to what Wasabi Wallet does - create some transaction cache file in JM data dir that's independent from wallets and shared between all wallets user might have (but most of the users likely have just one wallet).

Thinking more about this, now I think it really should be outside wallet, in independent cache file, sqlite database seems the best option for us.

There are two reasons:

  1. This way transaction cache can be shared between wallets. And it can even be just copied when migrating between different Bitcoin Core instances.
  2. Independent cache file can be updated also when using read-only wallet-tool.py methods like display and history.

I plan to work on this.

Tried implementing persistent sqlite gettransaction cache, but didn't get much performance gain, as I expected. Best was getting some seconds of improvement on ~5 minutes for >1000 tx wallet, don't think it's worth. Anyway, here's the code, maybe others have ideas - kristapsk@ae16eff.