lightningnetwork/lnd

[bug]: listunspent shows spent outputs

kallerosenbaum opened this issue · 12 comments

Background

This might be a duplicate of #5959, but not sure so I'll open a new ticket.

We have two LND nodes of the same version 0.17.2 (but later they got upgraded to 0.17.5). We call them lnd-main and lnd-routing.

Both these nodes recently reported and still report outputs they don't have. I'm focusing on lnd-routing here (leaving out all correct outputs):

# date; lncli listunspent 1|grep outpoint                                                                                                                 
Tue May 28 10:01:04 UTC 2024                                                                                                                                             
                        "outpoint": "a4c248d5344ea4aa1ac9800e46aad117a77aa27b8e6501c1f84c6c52765e392e:0",
                        "outpoint": "685fdb1818fce6fe2d4fac960044e490ed6ab6965a45e454570e2424577fca82:0",
                        "outpoint": "4e8e54a38f11576cf9b0599c6eee2e2321532f40eea584aa410e16cfefd1dee9:0",
                        "outpoint": "9e83457fc8fe53b537330dcd32d99d62588d61fc3ea7b6b2ec392fd3c7b842b5:0",
                        "outpoint": "a592d2a1ef610e084d44e102fcb844a966cbe97557ec7b92a6b555c3ea294ac5:0",
                        "outpoint": "1254868d20ad31359144b684a483a64e44afb100bab3d7402897667228305d91:0",
                        "outpoint": "575faeb98596d4ff9064323924bf90b1672a36f1737405b70a800be908fc3fb1:0",
                        "outpoint": "fbb02dfcecf7c5ca7d2afe60d5b35b884151a58cf21dace1e63aab3217853ed1:0",
                        "outpoint": "68e7ecf111f1570078547d4d38de0034c806687e7e46391f87e08c3983e78e0d:0",
                        "outpoint": "6d138e3729973d1dd9078e6ff15cc10c40707be1a390c110a6ed0ea1cb01fec4:0",

These are already spent by tx 61f8f4b41196b01a3acb3eee16a81f890540d3b624ed2b0efa730520947f272f, which was broadcast 2024-04-09 and confirmed on 2024-05-15.

Your environment

  • LND 0.17.2 when issue appeared, 0.17.5 when we discovered it.,
  • Linux lnd-main-0 6.2.0-1017-aws # 17~22.04.1-Ubuntu SMP Fri Nov 17 21:07:13 UTC 2023 x86_64 Linux
  • bitcoin core 26 when issue appeared, 27 when we discovered the issue
  • running in kubernetes

Steps to reproduce

When searching through the logs I found this line right after a restart on 2024-05-21:

2024-05-21 12:10:16.516 [INF] LNWL: 61f8f4b41196b01a3acb3eee16a81f890540d3b624ed2b0efa730520947f272f: broadcast failed because of: double spend: -25: bad-txns-inputs-missingorspent
2024-05-21 12:10:16.519 [INF] LNWL: Removed invalid transaction: (*wire.MsgTx)(0xc002faae10)({
Version: (int32) 2,
TxIn: ([]*wire.TxIn) (len=10 cap=10) {
(*wire.TxIn)(0xc003992000)({
PreviousOutPoint: (wire.OutPoint) 1254868d20ad31359144b684a483a64e44afb100bab3d7402897667228305d91:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  a3 fb b2 d9 44 03 50 9b  ab 3c cd 88 b5 07 c9 e6  |....D.P..<......|
00000010  b2 bf 64 2d 46 f5 7f f9  cb 82 6f f0 7b bb 4f 63  |..d-F.....o.{.Oc|
00000020  b1 fd cd d5 28 33 0c 87  5a ea 0d d0 cd b7 4d 93  |....(3..Z.....M.|
00000030  e5 22 27 bd da ab 15 1f  54 d7 3b 50 9b b2 22 e8  |."'.....T.;P..".|
}
},
Sequence: (uint32) 4294967293
}),
(*wire.TxIn)(0xc003992060)({
PreviousOutPoint: (wire.OutPoint) 4e8e54a38f11576cf9b0599c6eee2e2321532f40eea584aa410e16cfefd1dee9:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  0f d1 41 fe 8e 7a c2 19  ec 20 95 62 37 6f 0a d4  |..A..z... .b7o..|
00000010  53 00 c4 cd 18 b8 06 64  31 5c 48 96 bb 68 30 5f  |S......d1\H..h0_|
00000020  12 e1 ae 95 7c 2e 13 3c  b2 e3 87 20 80 05 87 ea  |....|..<... ....|
00000030  b6 76 d6 46 61 bd 64 f4  01 7c 90 53 fc 53 1a 9f  |.v.Fa.d..|.S.S..|
}
},
Sequence: (uint32) 4294967293
}),
(*wire.TxIn)(0xc0039920c0)({
PreviousOutPoint: (wire.OutPoint) 575faeb98596d4ff9064323924bf90b1672a36f1737405b70a800be908fc3fb1:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  48 67 77 34 27 b2 a0 e2  f3 93 0d a9 44 1a 6b 67  |Hgw4'.......D.kg|
00000010  2a dc 04 a3 e5 2b da 47  43 a9 b4 b5 5b 6d c1 08  |*....+.GC...[m..|
00000020  b5 21 7e 3e d9 dd 63 e7  30 a9 65 d3 d4 4f 76 ea  |.!~>..c.0.e..Ov.|
00000030  af 2b d7 b8 26 bc ac ff  03 d7 bf 44 9f 9c 34 b8  |.+..&......D..4.|
}
},
Sequence: (uint32) 4294967293
}),
(*wire.TxIn)(0xc003992120)({
PreviousOutPoint: (wire.OutPoint) 685fdb1818fce6fe2d4fac960044e490ed6ab6965a45e454570e2424577fca82:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  6a 90 cb ea ba 6a 57 55  e6 e9 45 fb c0 16 d2 00  |j....jWU..E.....|
00000010  4a 6a f1 23 f1 93 f2 b4  e4 3d b9 b5 7f 17 a8 5d  |Jj.#.....=.....]|
00000020  66 ff 45 d2 59 49 12 04  48 a0 0e cb cb 73 52 b5  |f.E.YI..H....sR.|
00000030  75 c7 9b 08 97 c5 03 51  0d 1a ef 19 89 08 05 c1  |u......Q........|
}
},
Sequence: (uint32) 4294967293
}),
(*wire.TxIn)(0xc003992180)({
PreviousOutPoint: (wire.OutPoint) 68e7ecf111f1570078547d4d38de0034c806687e7e46391f87e08c3983e78e0d:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  5e df 1a cf 05 1d e4 12  5b 63 5a 62 c2 95 c8 72  |^.......[cZb...r|
00000010  e9 25 a1 c7 64 85 d9 9c  30 81 59 cb 64 00 55 dc  |.%..d...0.Y.d.U.|
00000020  99 da 6f b5 d4 b7 6b 3d  7a 1e df 37 56 12 fd 7c  |..o...k=z..7V..||
00000030  a1 da d0 0b 65 4b cd 88  77 f0 cc ea f3 30 a9 c0  |....eK..w....0..|
}
},
Sequence: (uint32) 4294967293
}),
(*wire.TxIn)(0xc0039921e0)({
PreviousOutPoint: (wire.OutPoint) 6d138e3729973d1dd9078e6ff15cc10c40707be1a390c110a6ed0ea1cb01fec4:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  29 3e d1 77 60 29 21 22  21 33 34 0a 74 f6 9e 5f  |)>.w`)!"!34.t.._|
00000010  2d 1d 6b 36 12 8a 98 ba  2a 10 da 30 09 85 c2 4e  |-.k6....*..0...N|
00000020  e0 d0 c9 0a 70 f7 5d 1c  ac 93 b3 c4 83 a8 cc a3  |....p.].........|
00000030  75 59 74 4e 34 b9 eb 88  b4 ce ca 65 af 48 3c d2  |uYtN4......e.H<.|
}
},
Sequence: (uint32) 4294967293
}),
(*wire.TxIn)(0xc003992240)({
PreviousOutPoint: (wire.OutPoint) 9e83457fc8fe53b537330dcd32d99d62588d61fc3ea7b6b2ec392fd3c7b842b5:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  20 f1 4e 7d ac ee 8d 3e  59 05 8c a6 fc fe 9e 08  | .N}...>Y.......|
00000010  b2 54 99 7d c9 5b 38 20  c9 d0 33 b9 3c f0 a4 71  |.T.}.[8 ..3.<..q|
00000020  89 3d 28 5a de 9b 7c b6  b2 c9 a7 86 e3 19 f2 7d  |.=(Z..|........}|
00000030  bf f1 9c bd 1c bf 3d 37  48 a5 a3 a3 35 60 7b 81  |......=7H...5`{.|
}
},
Sequence: (uint32) 4294967293
}),
(*wire.TxIn)(0xc0039922a0)({
PreviousOutPoint: (wire.OutPoint) a4c248d5344ea4aa1ac9800e46aad117a77aa27b8e6501c1f84c6c52765e392e:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  64 58 99 13 57 72 36 20  9c f6 ca c0 93 6e 73 30  |dX..Wr6 .....ns0|
00000010  d3 3f 7c e3 32 b7 f1 dc  04 c3 40 cc 15 e0 5c 8e  |.?|.2.....@...\.|
00000020  6f b0 18 a1 64 85 4a bb  b7 ec 75 99 3f 1d 7b 8f  |o...d.J...u.?.{.|
00000030  f0 61 81 b0 31 ef 3c b3  71 bd 31 11 e7 68 78 56  |.a..1.<.q.1..hxV|
}
},
Sequence: (uint32) 4294967293
}),
(*wire.TxIn)(0xc003992300)({
PreviousOutPoint: (wire.OutPoint) a592d2a1ef610e084d44e102fcb844a966cbe97557ec7b92a6b555c3ea294ac5:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  16 8a 55 46 39 54 2a eb  22 61 99 cf 2c 6b 12 49  |..UF9T*."a..,k.I|
00000010  c0 95 91 eb 52 e0 34 5c  47 d7 20 35 43 7d 73 eb  |....R.4\G. 5C}s.|
00000020  71 55 41 e9 2a 1b a6 3f  65 d5 97 69 f7 88 5d 2c  |qUA.*..?e..i..],|
00000030  d6 50 c3 64 2b fb 96 95  6e 45 f2 f5 8d f8 87 98  |.P.d+...nE......|
}
},
Sequence: (uint32) 4294967293
}),
(*wire.TxIn)(0xc003992360)({
PreviousOutPoint: (wire.OutPoint) fbb02dfcecf7c5ca7d2afe60d5b35b884151a58cf21dace1e63aab3217853ed1:0,
SignatureScript: ([]uint8) {
},
Witness: (wire.TxWitness) (len=1 cap=1) {
([]uint8) (len=64 cap=64) {
00000000  4b ed 3e 24 fe b2 0e 12  b5 ab 8b a9 73 a2 33 36  |K.>$........s.36|
00000010  b2 43 a2 de 55 a8 11 b1  21 77 a6 58 e6 20 db 1f  |.C..U...!w.X. ..|
00000020  be fa 27 a4 b0 15 0e af  29 bc fd b3 8c 94 ce 0a  |..'.....).......|
00000030  70 c5 c0 d6 21 f9 11 f2  8c 65 cd ac 47 15 1b 14  |p...!....e..G...|
}
},
Sequence: (uint32) 4294967293
})
},
TxOut: ([]*wire.TxOut) (len=1 cap=1) {
(*wire.TxOut)(0xc0013e8b80)({
Value: (int64) 407085,
PkScript: ([]uint8) (len=22 cap=22) {
00000000  00 14 43 29 c6 53 f3 8c  b9 ed 83 28 6c 7e cb 12  |..C).S.....(l~..|
00000010  cf 73 45 ff 58 36                                 |.sE.X6|
}
})
},
LockTime: (uint32) 0
})

2024-05-21 12:10:16.520 [INF] LNWL: Removed invalid transaction: 0200000000010a915d30287266972840d7b3ba00b1af444ea683a484b644913531ad208d8654120000000000fdffffffe9ded1efcf160e41aa84a5ee402f5321232eee6e9c59b0f96c57118fa3548e4e0000000000fdffffffb13ffc08e90b800ab7057473f1362a67b190bf2439326490ffd49685b9ae5f570000000000fdffffff82ca7f5724240e5754e4455a96b66aed90e4440096ac4f2dfee6fc1818db5f680000000000fdffffff0d8ee783398ce0871f39467e7e6806c83400de384d7d54780057f111f1ece7680000000000fdffffffc4fe01cba10eeda610c190a3e17b70400cc15cf16f8e07d91d3d9729378e136d0000000000fdffffffb542b8c7d32f39ecb2b6a73efc618d58629dd932cd0d3337b553fec87f45839e0000000000fdffffff2e395e76526c4cf8c101658e7ba27aa717d1aa460e80c91aaaa44e34d548c2a40000000000fdffffffc54a29eac355b5a6927bec5775e9cb66a944b8fc02e1444d080e61efa1d292a50000000000fdffffffd13e851732ab3ae6e1ac1df28ca55141885bb3d560fe2a7dcac5f7ecfc2db0fb0000000000fdffffff012d360600000000001600144329c653f38cb9ed83286c7ecb12cf7345ff58360140a3fbb2d94403509bab3ccd88b507c9e6b2bf642d46f57ff9cb826ff07bbb4f63b1fdcdd528330c875aea0dd0cdb74d93e52227bddaab151f54d73b509bb222e801400fd141fe8e7ac219ec209562376f0ad45300c4cd18b80664315c4896bb68305f12e1ae957c2e133cb2e38720800587eab676d64661bd64f4017c9053fc531a9f01404867773427b2a0e2f3930da9441a6b672adc04a3e52bda4743a9b4b55b6dc108b5217e3ed9dd63e730a965d3d44f76eaaf2bd7b826bcacff03d7bf449f9c34b801406a90cbeaba6a5755e6e945fbc016d2004a6af123f193f2b4e43db9b57f17a85d66ff45d25949120448a00ecbcb7352b575c79b0897c503510d1aef19890805c101405edf1acf051de4125b635a62c295c872e925a1c76485d99c308159cb640055dc99da6fb5d4b76b3d7a1edf375612fd7ca1dad00b654bcd8877f0cceaf330a9c00140293ed177602921222133340a74f69e5f2d1d6b36128a98ba2a10da300985c24ee0d0c90a70f75d1cac93b3c483a8cca37559744e34b9eb88b4ceca65af483cd2014020f14e7dacee8d3e59058ca6fcfe9e08b254997dc95b3820c9d033b93cf0a471893d285ade9b7cb6b2c9a786e319f27dbff19cbd1cbf3d3748a5a3a335607b81014064589913577236209cf6cac0936e7330d33f7ce332b7f1dc04c340cc15e05c8e6fb018a164854abbb7ec75993f1d7b8ff06181b031ef3cb371bd3111e76878560140168a554639542aeb226199cf2c6b1249c09591eb52e0345c47d72035437d73eb715541e92a1ba63f65d59769f7885d2cd650c3642bfb96956e45f2f58df8879801404bed3e24feb20e12b5ab8ba973a23336b243a2de55a811b12177a658e620db1fbefa27a4b0150eaf29bcfdb38c94ce0a70c5c0d621f911f28c65cdac47151b1400000000

This transaction is 61f8f4b41196b01a3acb3eee16a81f890540d3b624ed2b0efa730520947f272f.

The same happened on lnd-main too when it restarted about 30 minutes prior to lnd-routing, but I won't list the details about that issue here. I'll provide whatever you need upon request.

Before we discovered this we upgraded both nodes to 0.17.5 on May 23.

I'm hesitant to restart with --reset-wallet-transactions because I don't know the consequences. The wallet is very old and I'm not sure how long the rescanning would take and what limitations apply on the node while doing it.

Hi @kallerosenbaum,

Thanks for the issue report! I've been trying to investigate the issue to see if I can spot the root cause.

I have some initial questions and requests:

  1. The log line(s) you've included, i.e.:
    2024-05-21 12:10:16.516 [INF] LNWL: 61f8f4b41196b01a3acb3eee16a81f890540d3b624ed2b0efa730520947f272f: broadcast failed because of: double spend: -25: bad-txns-inputs-missingorspent
    Does this log line reoccur for every restart of lnd, or was this showed only once?
  2. Could you possibly include a few more of the log lines that are before and after the log lines you've included? I.e. the log lines right before 2024-05-21 12:10:16.516 and after 2024-05-21 12:10:16.520.

Finally, you're correct that restarting lnd with the --reset-wallet-transactions flag set will most likely solve your issue. Just like you say though, the restart will take a long time given the age of the node and and will result in limited functionality during the rescan. Unfortunately it's hard to give any real estimates of how long the rescan will take as that depends on many factors.

@ViktorTigerstrom thanks! Correction: we didn't upgrade from 0.17.2 to 0.17.5, we upgraded from 0.17.3 to 0.17.5.

  1. It seems to be only this one restart. I searched the logs for '61f8f4b41196b01a3acb3eee16a81f890540d3b624ed2b0efa730520947f272f: broadcast failed' || '0.17.' and this came out:
2024-05-21 11:42:08.185 [INF] LTND: Version: 0.17.3-beta commit=v0.17.3-beta, build=production, logging=default, debuglevel=info
2024-05-21 12:10:16.516 [INF] LNWL: 61f8f4b41196b01a3acb3eee16a81f890540d3b624ed2b0efa730520947f272f: broadcast failed because of: double spend: -25: bad-txns-inputs-missingorspent
2024-05-23 12:21:10.200 [INF] LTND: Version: 0.17.5-beta commit=v0.17.5-beta, build=production, logging=default, debuglevel=info
2024-05-23 14:30:08.538 [INF] LTND: Version: 0.17.5-beta commit=v0.17.5-beta, build=production, logging=default, debuglevel=info
  1. See attached file
    lnd-routing-2024-05-21-noonish.log

@ViktorTigerstrom I'm probably going to restart with --reset-wallet-transactions early next week, so if you want anything from me before that, please shout. Maybe useful troubleshooting data will get lost when we reset?

Apologies for the late reply @kallerosenbaum! Thank you for the extended logs, we should hopefully have what we need for the further investigation with that. You can go ahead with the restart with --reset-wallet-transactions to resolve the issue on your side. Thanks!

I should probably mention that we sweep transactions in a maybe somewhat unusual way.

        // 1. Create a template psbt with the exact set of inputs and an output of 1000 sats (small non-dust number),
        //    and 1 sat_vbyte fee.
        // 2. Call fundPsbt, whilch will fill in required signing info in the psbt inputs, but not sign it. It also
        //    potentially adds a change output.
        // 3. Call finalizePsbt to sign the psbt. Now we have a fully signed psbt that lets us figure out the vSize of
        //    that transaction.
        // 4. Then we need to copy the template psbt and delete any change output LND added and set the
        //    single output's amount to <sumofinputs> - <vsizeofnewtransaction> * feeRate
        // 5. Sign
        // 6. Make sanity checks, etc., and publish

@ViktorTigerstrom ran --reset-wallet-transactions and it's back to normal. But I have a feeling the issue will resurface, since it happened on both our nodes. I think it might have to do with our onchain sweeping method (described in the comment above) which runs on both nodes. The method should be fine, but maybe it isn't handled well by LND for some reason. Or am I doing something that I really shouldn't be doing?

Just curious, how long did it take to restart lnd with the --reset-wallet-transactions flag set?

But I have a feeling the issue will resurface, since it happened on both our nodes.

I think there's some investigation work needs to be done, think we should target this for 0.19.0? cc @saubyk

// 1. Create a template psbt with the exact set of inputs and an output of 1000 sats (small non-dust number),
// and 1 sat_vbyte fee.

As for the sweeping method, I don't see anything wrong here - is this used for UTXO aggregation purpose? How did you get this set of inputs?

I think there's some investigation work needs to be done, think we should target this for 0.19.0? cc @saubyk

Tagged for 0.19

@yyforyongyu

how long did it take to restart lnd with the --reset-wallet-transactions flag set?

2024-06-11 07:40:27.005 [INF] LNWL: Started rescan from block 0000000000000000000dc34778ca1059404b0761795e5a91e3ae84c16d855b30 (height 537669) for 25764 addresses
2024-06-11 07:40:30.480 [INF] LNWL: Rescanned through block 000000000000000000200b39910d1e9566501a63026d4a88588aa6f7f0c4eb8b (height 537678)
2024-06-11 07:45:24.161 [INF] LNWL: Rescanned through block 00000000000000000011b3e92e82e0f9939093dccc3614647686c20e5ebe3aa6 (height 540000)
... and so on ...
2024-06-12 01:43:12.018 [INF] LNWL: Finished rescan for 25764 addresses (synced to block 0000000000000000000247841cf08aff2f584491f7738e5a41df8188a72d9901, height 847548)

is this used for UTXO aggregation purpose?

Yes, we aggregate to an external wallet.

How did you get this set of inputs?

listunspent 1 over grpc. Then we remove outputs that cost more than a threshold percentage to spend and make sure to keep reserved funds in LND.

I think it might have to do with the relatively low fees we set on the transactions. The one described in this issue (61f8f4b41196b01a3acb3eee16a81f890540d3b624ed2b0efa730520947f272f) took more than 5 weeks to confirm (when it was CPFP'd).

1 sat/vb is indeed pretty low - during a low-fee environment it should be enough to get the tx in the mempool, then it can be CPFPed. During a high-fee environment, however, peers may not relay the tx (you can check peers' minimal relay feerate via bitcoind-cli getpeerinfo | jq '.[].minfeefilter').

There's the lncli sendcoins that may be more suitable in this case, as it performs a series of checks,

  • validate min relay feerate is met;
  • validate the constructed tx passes mempool constraints (testmempoolaccept);
  • actively rebroadcast the tx in case it falls out of the mempool.

Meanwhile we will look into the listunspent issue in 0.19!

1 sat/vb is indeed pretty low

We don't set 1 sat/vB. We set fee according to bitcoin core's estimatesmartfee 1008. The 1 sat/vB is just for the initial "template transaction". See step 4 in the algo above. The tx in this issue had 5.16 sat/vB.

There's the lncli sendcoins that may be more suitable in this case

I opted not to use that since it doesn't give me enough control over the UTXOs to spend.