lightningnetwork/lnd

[bug]: fees default to 253sat/kw if fees cannot be downloaded from feeurl

JaviLib opened this issue · 8 comments

Background

Continuing from this bug #8675 which is also commented here ElementsProject/lightning#7254 it seems that LND defaults to 253sat/kw when the feeurl cannot be downloaded because of some reason, like bad internet connection or timeout from the feeurl server.

From that moment on, all channels get disabled because of extremely low fees, and LND nodes are able to reconnect and renegotiate when feeurl works again, but for some reason, CLN nodes connected aren't able to renegotiate the new fees.

Your environment

  • version of lnd 0.17
  • which operating system darwing arm64
  • neutrino
  • sometimes, low quality internet connection

Steps to reproduce

Make feeurl not available temporarily or timeout it, either at startup or during operation, and check how negotiated fees go down to 253sat/kw.
Second, if there is any channel open with a CLN node, watch how the renegotiation becomes imposible, and channels got disabled. If there is a pending HTLC then the channel gets force closed at timeout.

Expected behaviour

At least use the previous sane retrived fees, or do not try to renegotiate fees with any channel, returning a warning and waiting until they can be retrived.

Actual behaviour

Tries to renegotiate fees at ridiculous values, and cannot renegotiate fees with CLN when they come to normal.

I think we should return the error here to be handled by the callers instead of returning the fee floor,

// If the estimator returns an error, a zero value fee rate will be
// returned. We will log the error and return the fall back fee rate
// instead.
if err != nil {
log.Errorf("Unable to query estimator: %v", err)
}
// If the result is too low, then we'll clamp it to our current fee
// floor.
satPerKw := SatPerKVByte(feePerKb).FeePerKWeight()
if satPerKw < FeePerKwFloor {
satPerKw = FeePerKwFloor
}

Looking at other implementations of EstimateFeePerKW, think we should do the same - upper systems need to know there's an error in fee estimation instead of defaulting to the default fee rate floor.

We should also smooth/clamp the updates as well. This way we avoid adjusting too sharply in either direction.

We should also smooth/clamp the updates as well. This way we avoid adjusting too sharply in either direction.

I wonder what's the best strategy to clamp the fee-rate, percentage of the current feerate seems to be not the best design when values are small hmm ?

Moreover maybe we should unify the behaviour for example for the bitcoind estimator we will return the fallback fee in case there is an error while fetching the data.

We should also smooth/clamp the updates as well. This way we avoid adjusting too sharply in either direction.

I wonder what's the best strategy to clamp the fee-rate, percentage of the current feerate seems to be not the best design when values are small hmm ?

Moreover maybe we should unify the behaviour for example for the bitcoind estimator we will return the fallback fee in case there is an error while fetching the data.

As @yyforyongyu said, best thing in case of error is to accept what the other party suggests. And in case the other party doesn't provide any value, just use the latest ones retrived.

In case of sudden change but no error, yes, it is probably better to clamp it over the course of, for example, 6 blocks.

For any of the two cases, you would need to store the information of the latest feeurl retrived, probably in a small separate file.

You would also need to return an error if too much time has passed since the last retrival happened. Again, I would suggest returning an error after 6 blocks without retriving anything. Also an error should be provided just at the beginning, exiting the app, if the first retrival fails and the file with the stored values does not exist or is too old.

This caused all our LDK nodes who had channels with our LSP (which runs LND) to be force closed. This is a catastrophic bug, I hope a fix is prioritized.

2024-05-10 18:38:55 ERROR [lightning::ln::channelmanager:7150] Closing channel 04259a5524219241061608caa6216811cfcfcf713291fc3c839dde1d47a90a72 due to close-required error: Peer's feerate much too low. Actual: 253. Our expected lower limit: 2993
2024-05-10 18:38:55 ERROR [lightning::ln::channelmanager:8971] Force-closing channel: Peer's feerate much too low. Actual: 253. Our expected lower limit: 2993
2024-05-10 18:38:55 DEBUG [lightning::ln::channelmanager:2893] Finishing closure of channel due to Channel closed because of an exception: Peer's feerate much too low. Actual: 253. Our expected lower limit: 2993 with 0 HTLCs to fail
2024-05-10 18:38:55 INFO  [lightning::chain::channelmonitor:2796] Applying force close update to monitor 04259a5524219241061608caa6216811cfcfcf713291fc3c839dde1d47a90a72 with 1 change(s).

One other thing to mention: the JSON schema of the feeurl is relatively simple. If you need a highly dependable endpoint, then I recommend you run your own instead of pointing to some that may be publicly available that have no uptime guarantees/promises.

We ran into this issue too. It’s one of the reasons why I created the fee estimator project which I discuss in https://strike.me/blog/blended-bitcoin-fee-estimations/ and built it out to have multiple layers of fallback and redundancy.

You can run and host it yourself and connect it up to multiple different fee sources that you run, or a combination of ones that you run plus publicly available ones.

We run it with multiple replicas and fallbacks connected our own memoool instance, the public mempool.space instance, our own explora instance, blockstream.info, and our own bitcoind node. It also has sanity checks to lookup the current block height from multiple sources and prioritise sources with the most current block height or ignore estimates that are older than some configured threshold, as well as a cache to ensure it’s efficient and robust.

We often experience times where at least one of two of those sources are not available, but for all to be unavailable at once is extremely unlikely and has never happened to us.

see https://github.com/LN-Zap/bitcoin-blended-fee-estimator

That said, this issue really will be really good one to resolve as it can be a disaster if you don’t have those multiple layers of redundancy setup.

Fixed by #8891

This comment also still stands: #8688 (comment)