DigiByte-Core/digibyte

Transactions getting stuck, with error too-long-mempool-chain in debug.log

junytuny opened this issue · 13 comments

We are using Digibyte node to send DGB payments to users on our GPT website. Our DGB node has been working completely fine for 4-5 years, which was built from source from old Digibyte's repo.
I recently updated our server to Ubuntu 22, and so I thought of upgrading all our coin nodes to the latest version. Every node update was successful (BTC, DOGE, LTC, etc), except for DGB, which did got built successfully, but issues arose during production usage.

Now transactions get stuck with the error too-long-mempool-chain in debug.log. They are attempted to be sent every day, and eventually succeeds after couple days or weeks.

Been facing this issue since almost 2-3 months. I have spent so many days and tried what not trying to resolve this. I have reinstalled the node many times with latest commits from the repo but always same issue. I unfortunately do not have deep knowledge of these things.
Even lost a lot of funds as I had to refund a lot of users, but then the transactions got confirmed weeks later. I had to eventually put DGB on maintenance, thought of opening this issue earlier but got busy with other things.

Here are the details of our node setup:

Version: I am using Digibyte Node built from source from the 'develop' branch. Last version I tried was of 2-3 weeks ago, not sure if latest commits fixes this.
Transaction Amount: They usually range from 2.5 - 10 USD, and generally there are 10-20 outputs in a single transaction.
Fees: Fees are automatically set by the node. I am not setting the fees manually, nor have provided any such configuration in digibyte.conf
How we are sending transaction: We are sending transactions using sendmany command using RPC API.

Debug.log: Here is an example log of time when a transaction fails. After that, they kept trying to submit every day. I have to manually remove it from mempool using removeprunedfunds.

2024-02-16T00:00:10Z [default wallet] keypool added 1 keys (1 internal), size=2000 (1000 internal)
2024-02-16T00:00:10Z [default wallet] keypool reserve 2932
2024-02-16T00:00:10Z [default wallet] keypool keep 2932
2024-02-16T00:00:10Z [default wallet] Fee Calculation: Fee:34972 Bytes:347 Tgt:6 (requested 6) Reason:"Conservative Double Target longer horizon" Decay 0.99931: Estimation: (98128.3 - 108186) 100.00% 538.0/(538.0 0 mem 0.0 out) Fail: (0 - 98128.3) 0.00% 0.0/(0.0 0 mem 0.0 out)
2024-02-16T00:00:10Z [default wallet] Fee non-grouped = 34972, grouped = 34972, using grouped
2024-02-16T00:00:10Z [default wallet] CommitTransaction:
2024-02-16T00:00:10Z [default wallet] AddToWallet d797caacb91efdcb0d9d71fddf798877275df67dc0a6d5ac81d490f98a2a63ed  newupdate
2024-02-16T00:00:10Z [default wallet] Submitting wtx d797caacb91efdcb0d9d71fddf798877275df67dc0a6d5ac81d490f98a2a63ed to mempool for relay
2024-02-16T00:00:10Z [default wallet] CommitTransaction(): Transaction cannot be broadcast immediately, too-long-mempool-chain, too many descendants for tx f5569087d11a2f0c7a46f31b79b88860fb5ddd9a6639f483e397a5952c2e78c6 [limit: 25]
**Here's our digibyte.conf** maxconnections=150 addnode=seed1.digibyte.io addnode=seed2.digibyte.io addnode=seed3.digibyte.io addnode=seed.digibyte.io addnode=seed.digibyteprojects.com addnode=digihash.co addnode=digiexplorer.info addnode=seed.digibyteguide.com addnode=seed-1.us.digibyteservers.io addnode=digibyte.host addnode=dgb-peer.nownodes.io

server=1
daemon=1

rpcuser=
rpcpassword=
rpctimeout=30
rpcport=

walletnotify=curl

I honestly didn't realized I wasn't supposed to use this version for production, silly me. Will have to be more careful next time when upgrading to latest versions. But I am not able to downgrade anymore, probably because those old versions of Digibyte is not compatible with latest Ubuntu.

If any other details are needed please feel free to let me know.

Have you tried 8.22.0-RC4? I believe RC4 will fix this issue, but it's only been out for a week. It would be very helpful for us to help troubleshoot this issue if you could test it. Thank you for letting us know and for using DGB!

Get RC4 here:
https://github.com/DigiByte-Core/digibyte/releases/tag/v8.22.0-rc4

Have you tried 8.22.0-RC4? I believe RC4 will fix this issue, but it's only been out for a week. It would be very helpful for us to help troubleshoot this issue if you could test it. Thank you for letting us know and for using DGB!

Get RC4 here: https://github.com/DigiByte-Core/digibyte/releases/tag/v8.22.0-rc4

Oh nice. I honestly missed this release.

Today I tried updating to v8.22.0-rc4 but the build would fail with following error:

Error when building v8.22.0-rc4
wallet/bdb.cpp: In member function ‘virtual bool BerkeleyDatabase::Backup(const string&) const’:
wallet/bdb.cpp:627:58: error: ‘fs::copy_options’ has not been declared
  627 |                     fs::copy_file(pathSrc, pathDest, fs::copy_options::overwrite_existing);
      |                                                          ^~~~~~~~~~~~
make[2]: *** [Makefile:10934: wallet/libdigibyte_wallet_a-bdb.o] Error 1
make[2]: *** Waiting for unfinished jobs....

Like I said before, I do not have too much idea of how all these things work. I am mainly web developer by nature. So I couldn't understand what this error was.
Therefore I tried building the latest code from the 'develop' branch, it threw a different error which I was able to resolve.

Error when building latest code from develop branch
checking for boostlib >= 1.84.0 (108400)... configure: We could not detect the boost libraries (version 1.84.0 or higher). If you have a staged boost library (still not installed) please specify $BOOST_ROOT in your environment and do not give a PATH to --with-boost option.  If you are sure you have boost installed, then check your version number looking in <boost/version.hpp>. See http://randspringer.de/boost for more documentation.
configure: error: Boost is not available!

I was able to fix this issue by building and installing latest version of boost library 1.84 (previously I had 1.71) and specified location to latest installation in ./configure command and it worked this time.

So I have now updated to the latest version, and removed DGB from maintenance on our site. I'll be testing this version over the coming days, If all withdrawals over the next 4-7 days gets processed without any issues then it'll mean the issue is resolved. I'll keep you guys updated!

Next time when you try to build DigiByte for Ubuntu(linux) do this:

cd digibyte/depends

make HOST=x86_64-linux-gnu -j10

cd ..

./autogen.sh

CONFIG_SITE=$PWD/depends/x86_64-linux-gnu/share/config.site ./configure

make -j10

@junytuny Did this get resolved?

@ycagel @JaredTate Sorry for the delay in getting back. It's not resolved yet. We are still facing the same issue.

As previously, we got the transaction ID, but the payment didn't go through. We processed 8 batched withdrawals on our site since the latest update, and 2 of them failed with same issue.

I am posting all the details of one of the transactions.

Debug.log:

2024-04-18T00:00:09Z [default wallet] keypool added 1 keys (1 internal), size=2000 (1000 internal)
2024-04-18T00:00:09Z [default wallet] keypool reserve 2965
2024-04-18T00:00:09Z [default wallet] keypool keep 2965
2024-04-18T00:00:09Z [default wallet] Fee Calculation: Fee:7110000 Bytes:711 Tgt:6 (requested 6) Reason:"Minimum Required Fee" Decay 0.96200: Estimation: (108186 - 1e+99) 100.00% 13.8/(13.8 0 mem 0.0 out) Fail: (0 - 108186) 84.06% 2.1/(2.5 0 mem 0.0 out)
2024-04-18T00:00:09Z [default wallet] keypool added 1 keys (1 internal), size=2000 (1000 internal)
2024-04-18T00:00:09Z [default wallet] keypool reserve 2966
2024-04-18T00:00:09Z [default wallet] keypool keep 2966
2024-04-18T00:00:09Z [default wallet] Fee Calculation: Fee:7110000 Bytes:711 Tgt:6 (requested 6) Reason:"Minimum Required Fee" Decay 0.96200: Estimation: (108186 - 1e+99) 100.00% 13.8/(13.8 0 mem 0.0 out) Fail: (0 - 108186) 84.06% 2.1/(2.5 0 mem 0.0 out)
2024-04-18T00:00:09Z [default wallet] Fee non-grouped = 7110000, grouped = 7110000, using grouped
2024-04-18T00:00:09Z [default wallet] CommitTransaction:
2024-04-18T00:00:09Z [default wallet] AddToWallet f0fae4a6d2ebc9b51ed89c92419df7e7bf57434a9c2ab8f9234a27b816b40d0d  newupdate
2024-04-18T00:00:09Z [default wallet] Submitting wtx f0fae4a6d2ebc9b51ed89c92419df7e7bf57434a9c2ab8f9234a27b816b40d0d to mempool for relay
2024-04-18T00:00:09Z [default wallet] CommitTransaction(): Transaction cannot be broadcast immediately, too-long-mempool-chain, too many descendants for tx 39222e187750608bc2c7ac8811a40b5b6579203643777ccb3be03a31997c9613 [limit: 25]
The transaction for which there are too many descendants of is not done by our wallet.
~$ digibyte-cli gettransaction "39222e187750608bc2c7ac8811a40b5b6579203643777ccb3be03a31997c9613"
error code: -5
error message:
Invalid or non-wallet transaction id
Our transaction which failed (f0fae4a6d2ebc9b51ed89c92419df7e7bf57434a9c2ab8f9234a27b816b40d0d) is attempted to be sent every day.
~$ tail -f debug.log -n 100000 | grep "f0fae4a6d2ebc9b51ed89c92419df7e7bf57434a9c2ab8f9234a27b816b40d0d"
2024-04-18T00:00:09Z [default wallet] AddToWallet f0fae4a6d2ebc9b51ed89c92419df7e7bf57434a9c2ab8f9234a27b816b40d0d  newupdate
2024-04-18T00:00:09Z [default wallet] Submitting wtx f0fae4a6d2ebc9b51ed89c92419df7e7bf57434a9c2ab8f9234a27b816b40d0d to mempool for relay
2024-04-19T09:08:54Z [default wallet] Submitting wtx f0fae4a6d2ebc9b51ed89c92419df7e7bf57434a9c2ab8f9234a27b816b40d0d to mempool for relay
2024-04-20T10:08:07Z [default wallet] Submitting wtx f0fae4a6d2ebc9b51ed89c92419df7e7bf57434a9c2ab8f9234a27b816b40d0d to mempool for relay
2024-04-21T17:11:22Z [default wallet] Submitting wtx f0fae4a6d2ebc9b51ed89c92419df7e7bf57434a9c2ab8f9234a27b816b40d0d to mempool for relay
The transaction's outputs, or parameters of the `sendmany` RPC command.
[
   "",
   {
      "DPtcWTDcHnnVgNrkiXFYv2.....":212.70622056,
      "DTTSvEq42ssmHVG3NDbqeR.....":198.71081587,
      "DAc4hivF2r8rEipX4FvWBj.....":198.03102373,
      "D5HNgca93hCSifYmYFSSHu.....":199.30720575,
      "DPDUWAVbZetiuRb9ah35X8.....":366.23578585,
      "DDfcqSDF6CYWAQbSgkujtJ.....":212.82336069,
      "DHiuM7dJ45dBRuCqQKSkx8.....":248.75132289,
      "DEU18itTPzYyxiWEke9GxS.....":260.58129239,
      "DRce8dYYFQeA7Qb4B5LN5J.....":230.86559255,
      "DFDERkBDNPWTr3HLUk7qNt.....":210.16519266,
      "DB4x9CuDPysGZB3aDaGcoM.....":290.99062387,
      "SZAZbSrXZwVXPLNMqQM5hX.....":248.17421858,
      "dgb1qkfv8tjuzj8fsrvzxf.....":238.64860716,
      "dgb1qnn8p68ndejhx4xcj5.....":424.64503051,
      "DFvmE3KgqE2fZ7P8XN9nAm.....":355.39445557,
      "DCyYJ7487dF936xuH5pbce.....":222.18025999,
      "D9MPKGWGc44DHKGsqpq78B.....":493.51485703,
      "dgb1q7782kz09auhhygn42.....":240.08567252
   }
]

// Note: Total 18 outputs. Addresses of the outputs have been redacted.

Let me know if you need any other details from my side. I'd be happy to help you guys resolve this issue. I guess I am one of lucky ones using this version for production. 😃

Thanks for your response. How many transactions are you trying to send at once?

Thanks for your response. How many transactions are you trying to send at once?

Usually there are 10-20 transactions at once. Payments are batched and sent once a day at midnight (UTC). Never had any kind of issues with previous DGB versions (the one on previous git repo).
I sent the specific transaction details and amounts in my previous reply. The transactions we do are usually similar to this one. If needed, I can send more examples of our other failed transactions.

The problem is getting severe now. 2 out of past 4 transactions failed. We have disabled DGB withdrawals again on our site and will have to look for third party implementations rather than relying on node until this issue is resolved :(

How many unconfirmed transactions are you trying to send at once? I cannot replicate this issue. There is a hard coded limit to limit wallet to 25 unconfirmed tx inputs being sent at once, for ancestors/ decendents, same with BTC. Is this issue on a mining pool or an exchange or some kind of service? More of a description as to exactly how this is occurring so I could try and replicate it would be nice. Thanks for your feedback.

Unfortunately even I cannot seem to be able to find the condition to replicate this. This never happened with BTC, and never happened with older versions of DGB. This issue occurs 40% of times when we send payments to our users with this new DGB version.

We are not mining pool or exchange service, we are simply an Get-Paid-To website, where users install apps and complete surveys for rewards, which they can redeem for BTC, DOGE, LTC, DGB, etc.

At a time, on an average we send 10-15 transactions. The issue happened with less than 10 transactions as well. I have tried a lot trying to look for a common condition that causes this issue, but I couldn't seem to be able to replicate it unfortunately.

If you need any other detail I'd be happy to help.

@junytuny I am looking into a way to reproduce this for you. Are you able to give me an example of what is submitted and what you are calling to create/submit the transaction? You can redact addresses, I am just looking for an example of the process, and structure you are submitting so I can see if I can generate the same behavior.

The error too-long-mempool-chain you're encountering indicates that a transaction is trying to spend output ancestors/descendants that are part of a chain of unconfirmed transactions which exceeds the allowed limit. In the case of Bitcoin, which DigiByte's validation logic is based on, the mempool (memory pool) rules reject transactions if they have more than 25 unconfirmed ancestors/descendants (previous transactions that are not yet confirmed).

This limit exists primarily to prevent potential issues related to transaction spam and to maintain the manageability of the node's mempool. It ensures that nodes are not overwhelmed by large chains of dependent transactions which could impact performance or lead to other network issues.

This error is coming from this code:

digibyte/src/txmempool.cpp

Lines 181 to 200 in 22528bf

size_t totalSizeWithAncestors = entry.GetTxSize();
while (!staged_ancestors.empty()) {
const CTxMemPoolEntry& stage = staged_ancestors.begin()->get();
txiter stageit = mapTx.iterator_to(stage);
setAncestors.insert(stageit);
staged_ancestors.erase(stage);
totalSizeWithAncestors += stageit->GetTxSize();
if (stageit->GetSizeWithDescendants() + entry.GetTxSize() > limitDescendantSize) {
errString = strprintf("exceeds descendant size limit for tx %s [limit: %u]", stageit->GetTx().GetHash().ToString(), limitDescendantSize);
return false;
} else if (stageit->GetCountWithDescendants() + 1 > limitDescendantCount) {
errString = strprintf("too many descendants for tx %s [limit: %u]", stageit->GetTx().GetHash().ToString(), limitDescendantCount);
return false;
} else if (totalSizeWithAncestors > limitAncestorSize) {
errString = strprintf("exceeds ancestor size limit [limit: %u]", limitAncestorSize);
return false;
}

The actual limit of 25 is set here:

/** Default for -limitancestorcount, max number of in-mempool ancestors */
static const unsigned int DEFAULT_ANCESTOR_LIMIT = 25;
/** Default for -limitancestorsize, maximum kilobytes of tx + all in-mempool ancestors */
static const unsigned int DEFAULT_ANCESTOR_SIZE_LIMIT = 101;
/** Default for -limitdescendantcount, max number of in-mempool descendants */
static const unsigned int DEFAULT_DESCENDANT_LIMIT = 25;
/** Default for -limitdescendantsize, maximum kilobytes of in-mempool descendants */
static const unsigned int DEFAULT_DESCENDANT_SIZE_LIMIT = 101;

Solution

What you can do is add the following to your digibyte.conf to bypass this limit & set it according to your needs. The below is a 4x increase in capacity.

limitdescendantcount=100
limitancestorsize=404
limitdescendantcount=100
limitdescendantsize=404

So why you may have been experiencing intermittent errors was probably the size of the txs and not the amount per say, but could have been both. Please let us know if this solutions works for you.

Going to go ahead and close this issue since the addition to digibyte.conf should fix your issue. If for some reason it doesn't, please feel free to re-open this issue and let us know. Thanks again for bringing this to our attention.