High CPU occupancy
kimw opened this issue ยท 14 comments
- If there's no connection, gq-server doing well.
- If there're some HTTP connections, gq-server doing well too.
- If there're any shadowsocks transfer (after successful trans for 2-5 seconds), gq-server sucks all CPU resource. We can find a number of gq-server process.
As I known there're two kind of redirections,
-
Web browser <==> gq-server <==> HTTPD server
I name it as HTTP REDIR. It was fixed in #13.
-
ss-local <==> gq-client <=> gq-server <=> ss-server
I name it as SS REDIR. Maybe need some kind of fix?
Here's a part of perf report showing below. In this report, we can find that main.(*ssPair).serverToRemote
-> io.ReadAtLeast
uses 90.19% of CPU.
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cpu-clock'
# Event count (approx.): 16060605900
#
# Children Self Samples Command Shared Object Symbol
# ........ ........ ............ ......... ................. ...................................................
#
95.53% 0.00% 0 gq-server gq-server [.] runtime.goexit
|
---runtime.goexit
|
--95.47%--main.(*ssPair).serverToRemote
|
|--90.19%--io.ReadAtLeast
| |
| |--85.66%--net.(*conn).Read
| | |
| | |--82.39%--net.(*netFD).Read
| | | |
| | | |--69.25%--internal/poll.(*FD).Read
| | | | |
| | | | |--54.78%--syscall.Read
| | | | | |
| | | | | |--51.57%--syscall.Syscall
| | | | | | |
| | | | | | |--23.21%--entry_SYSCALL_64_fastpath
| | | | | | | |
| | | | | | | --16.29%--sys_read
| | | | | | | |
| | | | | | | |--14.65%--vfs_read
| | | | | | | | |
| | | | | | | | |--9.18%--__vfs_read
| | | | | | | | | |
| | | | | | | | | |--7.74%--new_sync_read
| | | | | | | | | | |
| | | | | | | | | | --7.04%--sock_read_iter
| | | | | | | | | | |
| | | | | | | | | | |--5.66%--sock_recvmsg
| | | | | | | | | | | |
| | | | | | | | | | | |--3.96%--inet_recvmsg
| | | | | | | | | | | | |
| | | | | | | | | | | | --2.96%--tcp_recvmsg
| | | | | | | | | | | | |
| | | | | | | | | | | | --0.50%--tcp_release_cb
| | | | | | | | | | | |
| | | | | | | | | | | --1.13%--security_socket_recvmsg
| | | | | | | | | | | |
| | | | | | | | | | | --0.63%--apparmor_socket_recvmsg
| | | | | | | | | | | aa_sock_msg_perm
| | | | | | | | | | |
| | | | | | | | | | --0.63%--inet_recvmsg
| | | | | | | | | |
| | | | | | | | | |--0.69%--sock_read_iter
| | | | | | | | | |
| | | | | | | | | --0.50%--iov_iter_init
| | | | | | | | |
| | | | | | | | --4.78%--rw_verify_area
| | | | | | | | |
| | | | | | | | --3.71%--security_file_permission
| | | | | | | | |
| | | | | | | | --2.26%--apparmor_file_permission
| | | | | | | | |
| | | | | | | | --2.01%--common_file_perm
| | | | | | | | |
| | | | | | | | --0.50%--aa_file_perm
| | | | | | | |
| | | | | | | |--0.88%--__fdget_pos
| | | | | | | | |
| | | | | | | | --0.82%--__fget_light
| | | | | | | | __fget
| | | | | | | |
| | | | | | | --0.50%--__fget_light
| | | | | | |
| | | | | | |--7.61%--runtime.exitsyscall
| | | | | | | |
| | | | | | | |--1.89%--syscall.Syscall
| | | | | | | | |
| | | | | | | | --1.82%--entry_SYSCALL_64_fastpath
| | | | | | | | |
| | | | | | | | --1.26%--sys_read
| | | | | | | | vfs_read
| | | | | | | | |
| | | | | | | | --0.94%--__vfs_read
| | | | | | | | |
| | | | | | | | --0.63%--new_sync_read
| | | | | | | | |
| | | | | | | | --0.50%--sock_read_iter
| | | | | | | |
| | | | | | | --1.32%--runtime.exitsyscallfast
| | | | | | |
| | | | | | |--5.85%--syscall.Syscall
| | | | | | | |
| | | | | | | --5.41%--entry_SYSCALL_64_fastpath
| | | | | | | |
| | | | | | | --3.84%--sys_read
| | | | | | | |
| | | | | | | --3.21%--vfs_read
| | | | | | | |
| | | | | | | |--2.33%--__vfs_read
| | | | | | | | |
| | | | | | | | --2.14%--new_sync_read
| | | | | | | | |
| | | | | | | | --2.08%--sock_read_iter
| | | | | | | | |
| | | | | | | | --1.95%--sock_recvmsg
| | | | | | | | |
| | | | | | | | |--1.01%--inet_recvmsg
| | | | | | | | | |
| | | | | | | | | --0.88%--tcp_recvmsg
| | | | | | | | |
| | | | | | | | --0.50%--tcp_recvmsg
| | | | | | | |
| | | | | | | --0.75%--rw_verify_area
| | | | | | | |
| | | | | | | --0.63%--security_file_permission
| | | | | | |
| | | | | | |--3.02%--runtime.entersyscall
| | | | | | | |
| | | | | | | |--1.89%--runtime.reentersyscall
| | | | | | | | |
| | | | | | | | |--0.63%--syscall.Syscall
| | | | | | | | | entry_SYSCALL_64_fastpath
| | | | | | | | | |
| | | | | | | | | --0.50%--sys_read
| | | | | | | | | vfs_read
| | | | | | | | |
| | | | | | | | --0.50%--runtime.casgstatus
| | | | | | | |
| | | | | | | --1.01%--syscall.Syscall
| | | | | | | |
| | | | | | | --0.94%--entry_SYSCALL_64_fastpath
| | | | | | | |
| | | | | | | --0.82%--sys_read
| | | | | | | vfs_read
| | | | | | | |
| | | | | | | --0.50%--__vfs_read
| | | | | | |
| | | | | | |--0.94%--runtime.casgstatus
| | | | | | |
| | | | | | --0.50%--sys_read
| | | | | |
| | | | | |--2.01%--syscall.read
| | | | | |
| | | | | --0.75%--runtime.exitsyscall
| | | | |
| | | | |--5.41%--syscall.Syscall
| | | | | |
| | | | | --5.22%--entry_SYSCALL_64_fastpath
| | | | | |
| | | | | --3.71%--sys_read
| | | | | |
| | | | | --3.46%--vfs_read
| | | | | |
| | | | | |--2.20%--__vfs_read
| | | | | | |
| | | | | | --1.82%--new_sync_read
| | | | | | |
| | | | | | --1.57%--sock_read_iter
| | | | | | |
| | | | | | --1.38%--sock_recvmsg
| | | | | | |
| | | | | | --0.94%--inet_recvmsg
| | | | | | |
| | | | | | --0.63%--tcp_recvmsg
| | | | | |
| | | | | --1.01%--rw_verify_area
| | | | | |
| | | | | --0.75%--security_file_permission
| | | | |
| | | | |--1.51%--internal/poll.(*pollDesc).prepareRead
| | | | | |
| | | | | --1.45%--internal/poll.runtime_pollReset
| | | | |
| | | | |--1.19%--runtime.deferreturn
| | | | |
| | | | |--0.94%--internal/poll.(*fdMutex).rwlock
| | | | |
| | | | |--0.88%--runtime.deferproc
| | | | | |
| | | | | --0.63%--runtime.newdefer
| | | | |
| | | | |--0.75%--internal/poll.(*FD).readLock
| | | | | |
| | | | | --0.57%--internal/poll.(*fdMutex).rwlock
| | | | |
| | | | |--0.69%--runtime.jmpdefer
| | | | |
| | | | --0.57%--runtime.return0
| | | |
| | | |--8.05%--syscall.Syscall
| | | | |
| | | | --7.67%--entry_SYSCALL_64_fastpath
| | | | |
| | | | --4.84%--sys_read
| | | | |
| | | | --3.84%--vfs_read
| | | | |
| | | | |--2.52%--__vfs_read
| | | | | |
| | | | | --2.01%--new_sync_read
| | | | | |
| | | | | --1.70%--sock_read_iter
| | | | | |
| | | | | --1.51%--sock_recvmsg
| | | | | |
| | | | | --1.45%--inet_recvmsg
| | | | | |
| | | | | --1.07%--tcp_recvmsg
| | | | |
| | | | --1.19%--rw_verify_area
| | | | |
| | | | --0.75%--security_file_permission
| | | | |
| | | | --0.50%--apparmor_file_permission
| | | |
| | | |--1.19%--runtime.deferreturn
| | | |
| | | |--0.82%--runtime.deferproc
| | | |
| | | |--0.75%--internal/poll.(*FD).readUnlock
| | | |
| | | --0.75%--runtime.jmpdefer
| | |
| | --1.89%--syscall.Syscall
| | |
| | --1.82%--entry_SYSCALL_64_fastpath
| | |
| | --1.51%--sys_read
| | vfs_read
| | |
| | --1.07%--__vfs_read
| | |
| | --0.94%--new_sync_read
| | sock_read_iter
| | |
| | --0.69%--sock_recvmsg
| | |
| | --0.57%--inet_recvmsg
| | |
| | --0.50%--tcp_recvmsg
| |
| --2.20%--syscall.Syscall
| |
| --2.14%--entry_SYSCALL_64_fastpath
| |
| --1.45%--sys_read
| |
| --1.38%--vfs_read
| |
| --0.75%--__vfs_read
| |
| --0.63%--new_sync_read
| |
| --0.57%--sock_read_iter
|
|--4.40%--syscall.Syscall
| |
| --4.15%--entry_SYSCALL_64_fastpath
| |
| --3.27%--sys_read
| |
| --3.08%--vfs_read
| |
| |--2.01%--__vfs_read
| | |
| | --1.76%--new_sync_read
| | |
| | --1.70%--sock_read_iter
| | |
| | --1.38%--sock_recvmsg
| | |
| | --0.88%--inet_recvmsg
| | |
| | --0.75%--tcp_recvmsg
| |
| --0.82%--rw_verify_area
| |
| --0.69%--security_file_permission
|
--0.88%--net.(*TCPConn).Read
It occurs at ss-server report TCP connection timeour.
I found the solution, finnaly. Patch's on the way in 24hr (after I finish golang learning, OMG)
I have also found this issue happens frequently and get back to use the old version 464a11e.
It's weird.
- go 1.8 - works fine (
1.8.7
is tested) - go 1.9 - works fine (
1.9.3
,1.9.4
&1.9.5
are tested) - go 1.10 - high sys load (
1.10
&1.10.1
are tested)
@cbeuw Are you using go 1.9?
@linusyang Are you using go 1.10?
@kimw Yes. I was using go 1.10. The issue occurs only for the version after supporting TFO.
Edit: go 1.9 works perfectly. I think the issue might be related to the compatibility of TFO library with go 1.10.
@linusyang Yes, it's occured by Go TFO. Cause the go lang is not support TFO at all, (it keep delay since 1.8, 1.9 ... and addressed maybe will support in 1.11), all the TFO patches are running on the dark side, go lang offical libs are changing w/o thinking of it. The patch cat said, she's tired :)
The upstream, Go TFO, supports go 1.8 & 1.9 only. I'm tring to do some work on go 1.10, you know I'm a golang newbie, I work with manual...so, don't trust me.
Anyway, the solution I told earler is to dirty hack a connection timeout on shadowsocksToRemote(). It's a double side sword. The timeout saves the CPU but cut off connections forcely. Sigh!
@kimw
Yes I'm using go 1.9 and I couldn't reproduce this.
gotfo already has code variations between go1.8 and 1.9. There were probably some change in golang's internal/poll that caused the issue. I'm trying to figure it out.
I really hope the go team can incorporate TFO ASAP. I may use syscall for all socket operations if this becomes too inconvenient.
-
Sure, it's definitely cause of changes in inernal/poll.
-
Don't waste time on go team, let's DIY. The post begins with Over the past 4 years...
-
I've noticed that,
The Conn and Listener implementations in this package now guarantee that when Close returns, the underlying file descriptor has been closed. (In earlier releases, if the Close stopped pending I/O in other goroutines, the closing of the file descriptor could happen in one of those goroutines shortly after Close returned.)
See Go 1.10 Release Notes, https://golang.org/doc/go1.10#library
The high load issue is maybe cause by waiting for a closed connection. I mean, It's looped on non exist connection.
Anyway, it's do work on go 1.9 now. We can release it under this condition. go 1.10 will be supported on next release or never supported ;)
EDIT,
@cbeuw BTW, can you manage the release version? To be honest, I personally feel kind of mess. Sorry for the words.
@kimw
Yes my release management is pretty messy. This is my first major open source project and I need to learn the best practices.
As for now, the binaries in v1.1.2 were compiled with go1.9 and I've been using it for all my browsing activities for the last two days and there were no issues. I'll just add a build note and some build constraints to prevent people from using go1.10.
- You did a great job.
- About the release, shadowsocks-libev is a good sample.
- I also run the gq-server in go 1.9 for days. It works great. (BTW, the android plugin is not work for me)
- I still work in TFO in go 1.10, step by step in core code. I think I was fall in a big hole. As it said in https://stackoverflow.com/questions/26999615/go-tcp-read-is-non-blocking#27003111, maybe timeout (ReadDeadline) is the only solution. Whatever, I'm trying hard to code it in "normal way". If there's no way out, the timeout is the last shot.
Closed cbeuw/gotfo@9f0986e
@cbeuw I try to step into setDeadlineImpl()
, I do even added a log.Println()
at the first line of the func. But there's nothing output.
@kimw
Anything called after https://github.com/cbeuw/gotfo/blob/master/util.go#L65 or https://github.com/cbeuw/gotfo/blob/master/util.go#L74 will be irrelevant to gotfo.