Blutgang exiting with 137 when one of many endpoint is in pending state

Question

Blutgang exiting with 137 when one of many endpoint is in pending state

Opened this issue 7 months ago · 0 comments

Describe the bug
This bug seems to happen on kubernetes only. When a specific rpc endpoint is in pending state, for example if the node that was running the given pod has gone offline, and the pod can't be rescheduled to any other pod.

To Reproduce
Steps to reproduce the behavior:

Create a kubernetes cluster with 2 nodes
Schedule 2 rpc endpoints in each node (with node selector)
Setup blutgang to load balance between those two rpc endpoints
Shut down the physical node - let one rpc endpoint to be in pending state - while ingress is still active

Expected behavior
A clear and concise description of what you expected to happen.
I would expect this endpoint to be excluded from the list of load balancable rpc endpoints.

Specs:

k3s with kube version 1.127.10
physical nodes run debian 12, x86
blutgang version: 0.3.5

Blutgang options:

[blutgang]
do_clear = true
address = "0.0.0.0:3000"
ma_length = 100
sort_on_startup = true
health_check = true
header_check = true
ttl = 300
max_retries = 32
expected_block_time = 13000
health_check_ttl = 2000
supress_rpc_check = false

[admin]
enabled = true
address = "0.0.0.0:5715"
readonly = true
jwt = false
key = ""

[sled]
db_path = "/data/blutgang-cache"
mode = "HighThroughput"
cache_capacity = 1000000000
compression = false
print_profile = false
flush_every_ms = 240

[mainnet-besu-teku]
url = "http://mainnet-besu-teku:8545"
ws_url = "ws://mainnet-besu-teku:8545"
max_consecutive = 150
max_per_second = 200

# pending pod
[mainnet-besu-lighthouse]
url = "http://mainnet-besu-lighthouse:8545"
ws_url = "ws://mainnet-besu-lighthouse:8545"
max_consecutive = 150
max_per_second = 200