openethereum/parity-ethereum

Syncing efficiency inconsistent, sometimes quickly changes Mgas/s

vbuterin opened this issue · 5 comments

Running on Ubuntu 17.10 with a 512GB SSD, --cache-size 6144.

2017-12-11 03:45:16  Syncing #4618486 7293…15bf     0 blk/s   48 tx/s   3 Mgas/s    299+  564 Qed  #4619359   15/25 peers   213 MiB chain 162 MiB db 66 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:45:26  Syncing #4618495 c0aa…1a7e     0 blk/s   80 tx/s   3 Mgas/s    250+  605 Qed  #4619359   13/25 peers   213 MiB chain 163 MiB db 69 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:45:36  Syncing #4618501 babd…79ed     0 blk/s   46 tx/s   2 Mgas/s    208+  643 Qed  #4619359   13/25 peers   213 MiB chain 163 MiB db 71 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:45:46  Syncing #4618509 e91f…7be3     0 blk/s   80 tx/s   3 Mgas/s    161+  682 Qed  #4619359   13/25 peers   213 MiB chain 163 MiB db 74 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:45:56  Syncing #4618519 cbb8…81ac     1 blk/s   65 tx/s   3 Mgas/s     91+  737 Qed  #4619359   13/25 peers   213 MiB chain 163 MiB db 77 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:46:05  Syncing #4618545 0cd3…4bad     2 blk/s  253 tx/s  11 Mgas/s      0+  813 Qed  #4619359   13/25 peers   213 MiB chain 163 MiB db 81 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:46:15  Syncing #4618617 7cd1…c65e     7 blk/s  702 tx/s  33 Mgas/s      0+  741 Qed  #4619359   14/25 peers   214 MiB chain 164 MiB db 74 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:46:26  Syncing #4618671 3aec…d67c     5 blk/s  568 tx/s  26 Mgas/s      0+  685 Qed  #4619359   14/25 peers   214 MiB chain 164 MiB db 68 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:46:35  Syncing #4618725 e1aa…920d     5 blk/s  495 tx/s  22 Mgas/s      0+  633 Qed  #4619359   15/25 peers   215 MiB chain 165 MiB db 63 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:46:45  Syncing #4618806 73da…f8d3     8 blk/s  657 tx/s  31 Mgas/s      0+  553 Qed  #4619359   17/25 peers   216 MiB chain 165 MiB db 56 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:46:55  Syncing #4618865 b1cc…7fd2     6 blk/s  678 tx/s  33 Mgas/s      0+  493 Qed  #4619359   16/25 peers   216 MiB chain 166 MiB db 49 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs
2017-12-11 03:47:05  Syncing #4618910 0630…d317     4 blk/s  408 tx/s  21 Mgas/s      0+  445 Qed  #4619359   16/25 peers   217 MiB chain 166 MiB db 45 MiB queue 12 MiB sync  RPC:  0 conn,  0 req/s,   0 µs

A similar thing also happened a few hours earlier, but with the Mgas/s going from ~33 to ~5-10.

The efficiency changes seem too neatly aligned to fairly long time intervals (fast for a while, slow for a while, fast for a while) for this to reflect differences in block contents. So what's happening? Is this some kind of memory or database compaction-related issue?

Basically block import in parity is organized into a pipeline: There are N threads that do parallel block checks (PoW verification and ecrecover for transactions) where N equals to number of logical cores - 1. Then there's a single thread that executes transactions of pre-verified blocks. Most of the times these N threads are idle because the bottleneck is the execution. But when doing full sync occasionally a batch of blocks is downloaded from the network and that's when all N + 1 threads are busy utilizing all of the CPU.
Unfortunately when this happens the performance of each thread degrades significantly. First of all, logical cores are not physical and two threads executing on the same physical core are running almost twice as slow. Then, there's also CPU frequency throttling under heavy parallel load.

299+  564 Qed  
250+  605 Qed  

The first number here tells the number of blocks that are pending parallel verification. When it is not zero everything is slowing down.

A somewhat useful solution would be to allocate at least one physical core to the importing thread.

allocate at least one physical core to the importing thread

Is that not covered by the fact that the workers call set_sync when they finished verifying a block, yielding the thread to the import logic?

The first number here tells the number of blocks that are pending parallel verification. When it is not zero everything is slowing down.

And I assume correctly that the second number tells the number of blocks awaiting execution? Is there an explanation of all the columns, somewhere? @arkpar

Closing as it was answered by @arkpar

@Tbaut Can you double-check that we have documentation for what the different parts of the log-line means?