rainshowerLabs/blutgang

Request timeout when using request batching

Opened this issue ยท 6 comments

Describe the bug
On a fresh install/run with the default config, I get the error Request timeout for all my requests to blutgang.

To Reproduce

> curl 'http://127.0.0.1:3000' \
  -X POST \
  -H 'Content-Type: application/json' \
  --data-raw '{"jsonrpc":"2.0","id":1,"method":"eth_chainId"}'

> {code:-32001, message:"error: Request timed out! Try again later..."}
  • Find warning message in logs:
Info: Checking RPC health... OK!
Info: Connection from: 127.0.0.1:55106
Info: Forwarding to: https://eth.merkle.io
Wrn: An RPC request has timed out, picking new RPC and retrying.
Info: Forwarding to: https://eth.merkle.io
Wrn: An RPC request has timed out, picking new RPC and retrying.
Info: Forwarding to: https://eth.merkle.io
Wrn: An RPC request has timed out, picking new RPC and retrying.
...
Info: Checking RPC health... OK!
  • Run
> curl 'https://eth.merkle.io' \
  -X POST \
  -H 'Content-Type: application/json' \
  --data-raw '{"jsonrpc":"2.0","id":1,"method":"eth_chainId"}'

> {"jsonrpc":"2.0","id":1,"result":"0x1"}

Expected behavior
blutgang shouldn't timeout since the rpc didn't return any error/timeout.

Specs:

  • OS: archlinux
  • Kernel: linux 6.1
  • Blutgang version: 0.3.0
  • Docker version: 25.0.2

Alright, I missed the TTL config which was 30ms in the default config. I have to admit I wasn't expecting a TTL to be a request timeout. It's usually used for a lifespan (for example how much time to keep a data in a cache).

So after raising it to something like 5000ms, I don't hit the timeouts anymore.


Bu then I got another issue, still with the default config, only change is TTL set to 5000ms, butgang crashes after subscribing to newHeads without logging an error.

> docker run --rm -v $(realpath ./conf.toml):/app/config.toml --network host makemake1337/blutgang

Info: Using config file at config.toml
Sorting RPCs by latency...
https://eth.merkle.io: 129183072.13ns
Info: Starting Blutgang 0.3.0 Garreg Mach
Info: Bound to: 127.0.0.1:3000
Wrn: Reorg detected!
Removing stale entries from the cache.
Info: Adding user 1 to sink map
Info: Subscribe_user finding: ["newHeads"]

I tried with different RPC providers (free and paid). Every time the websocket transport is enabled, it crashes at startup.

Any ideas why ?

Thanks in advance!

@ishfx this is weird, does blutgang just hang at Info: Subscribe_user finding: ["newHeads"]? Could you try running blutgang with the debug-verbose feature flag (add -debug to your tag ie. 0.3.1-debug) and posting the output?

Hey @makemake-kbo , thanks for the reply. Here are my test results:

Still default config with 5s TTL.

  • 0.3.0: crashs after Subscribe_user finding: ["newHeads"]
  • 0.3.0-debug: No crash
  • 0.3.1: No crash
  • 0.3.1-debug: No crash

So 0.3.1 fixed something related to websockets, I saw a commit about that so maybe that was the issue.

I was using the latest docker version:

>docker run --rm -v $(realpath ./conf.toml):/app/config.toml --network host makemake1337/blutgang:latest ./blutgang --version

blutgang Blutgang 0.3.0 Garreg Mach

The 0.3.1 could be pushed to the docker repo I think.


But now I have another issue (with 0.3.1) ๐Ÿ‘€ This time I have the full log.

The test:

  • Default config with 5000 TTL
  • Run docker run --rm -v $(realpath ./conf.toml):/app/config.toml -e RUST_BACKTRACE=full --network host makemake1337/blutgang:0.3.1-debug
  • Send any batch rpc request
  • Error :
Info: Connection from: 127.0.0.1:34934
Incoming request: Request { method: POST, uri: /, version: HTTP/1.1, headers: {"content-type": "application/json", "connection": "keep-alive", "user-agent": "Bun/1.0.18", "accept": "*/*", "host": "127.0.0.1:3000", "accept-encoding": "gzip, deflate", "content-length": "53"}, body: Body(Streaming) }
thread 'tokio-runtime-worker' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/serde_json-1.0.113/src/value/index.rs:102:18:
cannot access key "id" in JSON array
stack backtrace:
   0:     0x5d25dcf7a1c9 - std::backtrace_rs::backtrace::libunwind::trace::hbee8a7973eeb6c93
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x5d25dcf7a1c9 - std::backtrace_rs::backtrace::trace_unsynchronized::hc8ac75eea3aa6899
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x5d25dcf7a1c9 - std::sys_common::backtrace::_print_fmt::hc7f3e3b5298b1083
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:68:5
   3:     0x5d25dcf7a1c9 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hbb235daedd7c6190
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x5d25dcdbbec0 - core::fmt::rt::Argument::fmt::h76c38a80d925a410
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/rt.rs:142:9
   5:     0x5d25dcdbbec0 - core::fmt::write::h3ed6aeaa977c8e45
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/mod.rs:1120:17
   6:     0x5d25dcf4ef0e - std::io::Write::write_fmt::h78b18af5775fedb5
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/io/mod.rs:1810:15
   7:     0x5d25dcf7bcde - std::sys_common::backtrace::_print::h5d645a07e0fcfdbb
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x5d25dcf7bcde - std::sys_common::backtrace::print::h85035a511aafe7a8
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x5d25dcf7b587 - std::panicking::default_hook::{{closure}}::hcce8cea212785a25
  10:     0x5d25dcf7c561 - std::panicking::default_hook::hf5fcb0f213fe709a
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:292:9
  11:     0x5d25dcf7c561 - std::panicking::rust_panic_with_hook::h095fccf1dc9379ee
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:779:13
  12:     0x5d25dcf7c050 - std::panicking::begin_panic_handler::{{closure}}::h032ba12139b353db
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:657:13
  13:     0x5d25dcf7bfa6 - std::sys_common::backtrace::__rust_end_short_backtrace::h9259bc2ff8fd0f76
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:171:18
  14:     0x5d25dcf7bf9f - rust_begin_unwind
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
  15:     0x5d25dcc5c594 - core::panicking::panic_fmt::h784f20a50eaab275
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
  16:     0x5d25dcee1c7f - <str as serde_json::value::index::Index>::index_or_insert::h9a1400ab3dacb31a
  17:     0x5d25dcd0a870 - blutgang::balancer::accept_http::accept_request::{{closure}}::h588bc6fed33bc4a1
  18:     0x5d25dcd705e2 - blutgang::main::{{closure}}::{{closure}}::h5afc1f1c69f69694
  19:     0x5d25dcce0454 - tokio::runtime::task::raw::poll::h23723640dae379c4
  20:     0x5d25dcf8a6ba - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h8ba28ea99da9e29b
  21:     0x5d25dcf90fde - tokio::runtime::task::raw::poll::h9cac45a83e86c75a
  22:     0x5d25dcf8225e - std::sys_common::backtrace::__rust_begin_short_backtrace::h65d2e6ae68906034
  23:     0x5d25dcf81f2f - core::ops::function::FnOnce::call_once{{vtable.shim}}::h0b921543ffca7473
  24:     0x5d25dcf7ddb5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h12de4fc57affb195
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
  25:     0x5d25dcf7ddb5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h3c619f45059d5cf1
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
  26:     0x5d25dcf7ddb5 - std::sys::unix::thread::Thread::new::thread_start::hbac657605e4b7389
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys/unix/thread.rs:108:17
  27:     0x713de16f8134 - <unknown>
  28:     0x713de1777a40 - clone
  29:                0x0 - <unknown>

Thanks !


More context about the error:

It just happens that I always batch my jsonrpc requests, this can be done by sending an array of jsonrpc calls. This is why blutgang couldn't parse the json correctly I guess.

To sum up:

  • Normal RPC call to blutgang: OK
> curl 'http://127.0.0.1:3000' \
  -X POST \
  -H 'Content-Type: application/json' \
  --data-raw '{"jsonrpc":"2.0","id":1,"method":"eth_blockNumber"}'

{"jsonrpc":"2.0","id":1,"result":"0x126d0da"}
  • Batch RPC call to RPC: OK
curl 'https://eth.merkle.io' \
  -X POST \
  -H 'Content-Type: application/json' \
  --data-raw '[{"jsonrpc":"2.0","id":1,"method":"eth_blockNumber"}]'
[{"jsonrpc":"2.0","id":1,"result":"0x126d0e1"}]
  • Batch RPC call to blutgang: NOK => CRASH, see stracktrace above
curl 'http://127.0.0.1:3000' \
  -X POST \
  -H 'Content-Type: application/json' \
  --data-raw '[{"jsonrpc":"2.0","id":1,"method":"eth_blockNumber"}]'
curl: (52) Empty reply from server

So my understanding is that blutgang doesn't handle Batched rpc calls (which is just an array of rpc call, executed sequentially).

@ishfx latest docker container has been pushed, thanks!

blutgang doesnt do batched requests at all. this will be a thing once the request processing pipeline gets refactored in the next release. there is no eta because it will be a major rewrite and includes major restructuring of the db aswell.

there was another issue in the past related to batched request that was closed due to lack of activity. ill keep this one until batched requests get merged: #17

@makemake-kbo Alright, fair enough! Thanks again for the work ๐Ÿ˜ƒ

I'll go ahead and close this issue then.

A quick note about batches: This is crucial for performance, so I think it should become a priority after refacto as blutgang strives to be the best load balancer in terms of performance.


Oups sorry: I didn't catch that last part about keeping this issue open, reopened!

Batching is something I'd love to see implemented.