Count of threads seems to be increasing unbounded over time
cburgdorf opened this issue ยท 4 comments
What is wrong?
Another interesting thing to observe is that around 4 AM disk reads skyrocket. I haven't looked through the logs yet but I bet that is around the time where I lose access to my local peer (maybe after router disconnect, maybe we blacklisted because #2008 caused havoc) and are left with vampires bleeding me out.
And another interesting thing is the IO spike at around 6 AM (I think the percentage bars for that chart are wrong btw) which may relate to what @carver said last standup he was seeing (getting IO overload after a while)
Logs ๐
How can it be fixed
๐ ๐
DEBUG 2020-09-11 04:44:48,578 BeamDownloader ETHPeer (eth, 65) <Session <Node(0xcbceb7@172.33.0.2)> 18299391-c15a-4c35-b15d-22f65292556e> returned 0 state trie nodes, penalize...
DEBUG 2020-09-11 04:44:48,579 QueeningQueue Penalizing ETHPeer (eth, 65) <Session <Node(0xcbceb7@172.33.0.2)> 18299391-c15a-4c35-b15d-22f65292556e> for 2.00s, for minor infraction
Yep, 4:44 my local peer gets penalized which is around the time where the vampires take over...
From there we never reconnect to my local peer. I guess that @gsalgado fix #2042 will probably help with that.
Btw, I think this must be fairly recent...The number of threads used to be very static.
If we look a bit further back it seems the number of threads was fairly static up to some point in the previous evening around 22 o'clock.
I gathered more logs so that we can inspect that timeframe, too.
DEBUG 2020-09-11 04:44:48,578 BeamDownloader ETHPeer (eth, 65) <Session <Node(0xcbceb7@172.33.0.2)> 18299391-c15a-4c35-b15d-22f65292556e> returned 0 state trie nodes, penalize...
DEBUG 2020-09-11 04:44:48,579 QueeningQueue Penalizing ETHPeer (eth, 65) <Session <Node(0xcbceb7@172.33.0.2)> 18299391-c15a-4c35-b15d-22f65292556e> for 2.00s, for minor infractionYep, 4:44 my local peer gets penalized which is around the time where the vampires take over...
From there we never reconnect to my local peer. I guess that @gsalgado fix #2042 will probably help with that.
FWIW, the penalties mentioned in these logs do not disconnect from the peer or mark them into the blacklist. It just waits 2 seconds before sending GetNodeData
again (sync will continue to ask for different types of data from that peer).