rakshasa/rtorrent

100% CPU usage and not doing anything

Closed this issue · 9 comments

The symptoms differ from #501 / #457.
rtorrent seems to stop doing anything while the UI is responsive and using 100% CPU on 2 threads.

System:
archlinux, rtorrent 0.9.6, libcurl 7.51.0, openssl 1.0.2j

top dump:

top - 04:16:00 up 39 min,  3 users,  load average: 1.96, 1.98, 1.96
Tasks: 143 total,   2 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu0  :  47.3/52.7  100[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||]
%Cpu1  :   0.0/0.0     0[                                                                               ]
%Cpu2  :   0.0/0.0     0[                                                                               ]
%Cpu3  :  29.1/70.9  100[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||]
%Cpu4  :   0.0/0.0     0[                                                                               ]
%Cpu5  :   0.0/0.0     0[                                                                               ]
%Cpu6  :   0.0/0.0     0[                                                                               ]
%Cpu7  :   0.0/0.0     0[                                                                               ]
GiB Mem : 32.8/3.862    [                                                                               ]

perf top dump:

   PerfTop:   14492 irqs/sec  kernel:80.4%  exact:  0.0% [4000Hz cpu-clock],  (all, 8 CPUs)
-------------------------------------------------------------------------------

    12.80%  libtorrent.so.19.0.0  [.] torrent::thread_base::event_loop
     8.73%  [kernel]              [k] _raw_spin_lock
     6.21%  [kernel]              [k] _raw_spin_unlock_irqrestore
     5.17%  [kernel]              [k] entry_SYSCALL_64_fastpath
     4.40%  [kernel]              [k] pvclock_clocksource_read
     3.42%  [kernel]              [k] __call_rcu.constprop.62
     2.61%  [vdso]                [.] 0x0000000000000981
     2.41%  [kernel]              [k] __fget
     2.31%  libtorrent.so.19.0.0  [.] torrent::signal_bitfield::work
     1.93%  [vdso]                [.] __vdso_gettimeofday
     1.77%  libpthread-2.24.so    [.] __GI___libc_accept
     1.76%  libtorrent.so.19.0.0  [.] torrent::PollEPoll::do_poll
     1.76%  [kernel]              [k] __slab_free
     1.47%  [kernel]              [k] inode_init_always
     1.44%  libc-2.24.so          [.] 0x00000000000e8dd3
     1.28%  [kernel]              [k] select_estimate_accuracy.part.2
     1.27%  [kernel]              [k] ep_send_events_proc
     1.25%  [kernel]              [k] fput
     1.18%  [kernel]              [k] _raw_spin_unlock_irq
     1.17%  [kernel]              [k] ktime_get_ts64

I have about 4k torrents added but this seems to happen when there are about 150 torrents started.
all of the torrents that are hashing gets closed.
any ideas?

UPDATE:
I've generated a flamegraph of the system with around 150 torrents started, both threads are spending most of their CPU time in system calls.....

Main thread:
screenshot from 2017-04-19 00-17-09

Worker thread:
screenshot from 2017-04-19 00-17-36

seems to be making a lot of of accept and epoll calls...

hmm, i got the same problem with curl-7.52.1, and no problems with curl-7.51.0

hmm, i got the same problem with curl-7.52.1, and no problems with curl-7.51.0

i can confirm that, i was wondering why it couldn't connect to http trackers anymore (didn't notice the cpu usage at the beginning) and downgrading curl to 7.51.0 fixed it

Have you reported this to the curl devs? They know better what changes might be causing this.

No, I did not. Moreover rtorrent (feature-bind) does not download from some private trackers.

Duplicate of #457.

UPDATE: updating to the new build of curl on archlinuxarm seems to have fixed the issue (https tracker timeouts)
https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/curl&id=bae58314ffe16b0642811fdb0187a9b8eba8d362

Updated original post with more info, any ideas?

I managed to reproduce this with a fresh install of Ubuntu 17 with a pimp-my-box installation...

@rakshasa
I don't think this is a duplicate of #457 as I don't have any trackers using HTTPS.

@rakshasa Just wanted to report that I was able to reproduce this issue with all torrents only using DHT with no trackers configured so it's probably not a duplicate of #457

System: NixOS 17.03
rTorrent: 0.9.6

I also started encountering this recently on a Debian testing amd64 box. I don't know if it coincided with an update to libcurl, but I know it coincided with an update from kernel 4.11 to 4.12.

Checking strace on rtorrent, during times of spiked CPU use, the system is pounded with epoll_wait(... {EPOLLOUT, ...}, ...) calls; the return value (number of available fds) seems to always equal the number of fds in the struct epoll_events* events, which appears to correspond to the number of 'handshake' sockets shown in the UI ([S 7/34/768]). (There have been a few occasions where handshakes is zero, and CPU use also drops.)

I'm not a kernel hacker, but I'm slightly suspicious of work on epoll that went into 4.12, like epoll: Add busy poll support to epoll with socket fds.

Edit: I should also note that the epoll_wait(...) calls appear to be happening roughly 1-5 times per second, based on the timeout value, e.g.:

epoll_wait(7, [{EPOLLOUT, {u32=152, u64=152}}, ..., 1024, 781) = 15
epoll_wait(7, [{EPOLLOUT, {u32=152, u64=152}}, ..., 1024, 780) = 15
epoll_wait(7, [{EPOLLOUT, {u32=152, u64=152}}, ..., 1024, 779) = 15
epoll_wait(7, [{EPOLLOUT, {u32=152, u64=152}}, ..., 1024, 779) = 15
epoll_wait(7, [{EPOLLOUT, {u32=152, u64=152}}, ..., 1024, 779) = 15
epoll_wait(7, [{EPOLLOUT, {u32=152, u64=152}}, ..., 1024, 779) = 15
epoll_wait(7, [{EPOLLOUT, {u32=152, u64=152}}, ..., 1024, 779) = 15
epoll_wait(7, [{EPOLLOUT, {u32=152, u64=152}}, ..., 1024, 778) = 15
epoll_wait(7, [{EPOLLOUT, {u32=152, u64=152}}, ..., 1024, 778) = 15