dydxprotocol/v4-chain

CPU/socket queue/response times spikes every ~8sec

Yukigaru opened this issue · 2 comments

I send transactions from my trading app with /broadcast_tx_sync to my fully operational dydx node and measure response roundtrip time. The problem is that at some times the metric is very high:
Min duration is ~1-2ms. Note: trading app is on the same host with the node, so network latency is minimal.
Max duration: up to 1-2 seconds.

I checked dydx log and found out that in such cases Received new short term order message appears 1-2sec after sending the order, which means the node haven't been receiving the Tx for long time. I opened ss tool and found out that every ~8 sec read queue sizes grow big (up to hundreds of kylobytes) for around half of node network sockets and stay big for 1-2 sec, which means goroutines doesn't read the data from sockets in time. Go perf shows that there are following notable time consumers at that moment:

  • goleveldb functions (~30%)
  • checkTx (~20%)
  • consensus functions (~40%)

To summarize: dydx node has performance spikes every ~8 sec, which can be seen in CPU utilization, in method response times and in socket statistics.

  1. Are those spikes known issue?

  2. Are there obvious reasons node can respond so long (from code/config standpoint)? (1-2 sec)

  3. Can I safely use rocksdb for DB (instead of goleveldb)?

  4. Are there typical optimization advice for configs? (I've seen tendermint guides, but not sure they are applicable)

  5. Can [mempool] recheck be disabled?

Thank you!

OS: ubuntu 20.04, kernel 5.15.0
Dydx chain release: from around January

Fixed with the new update.