xtaci/kcp-go

kcp.(*UDPSession).update 存在内存泄漏?

Closed this issue · 19 comments

在持续通信过程中,内存占用不断增加,最后oom。

Top如下:
image

list结果如下:
image

主要占用:

    1.78GB     2.14GB (flat, cum) 49.89% of Total
         .          .    583:func (s *UDPSession) update() {
         .          .    584:	select {
         .          .    585:	case <-s.die:
         .          .    586:	default:
         .          .    587:		s.mu.Lock()
         .    37.51MB    588:		interval := s.kcp.flush(false)
         .          .    589:		waitsnd := s.kcp.WaitSnd()
         .          .    590:		if waitsnd < int(s.kcp.snd_wnd) && waitsnd < int(s.kcp.rmt_wnd) {
         .          .    591:			s.notifyWriteEvent()
         .          .    592:		}
         .   318.03MB    593:		s.uncork()
         .          .    594:		s.mu.Unlock()
         .          .    595:		// self-synchronized timed scheduling
    1.78GB     1.79GB    596:		SystemTimedSched.Put(s.update, time.Now().Add(time.Duration(interval)*time.Millisecond))
         .          .    597:	}
         .          .    598:}
         .          .    599:
         .          .    600:// GetConv gets conversation id of a session
         .          .    601:func (s *UDPSession) GetConv() uint32 { return s.kcp.conv }
xtaci commented

应该是GC不及时的问题,是不是一个低端CPU?
可以启动时增加环境变量,例如GOGC=20 ./client_xxxxx,提高回收频率。

xtaci commented

两个可以设置的环境变量。

The GOGC variable sets the initial garbage collection target percentage. A collection is triggered when the ratio of freshly allocated data to live data remaining after the previous collection reaches this percentage. The default is GOGC=100. Setting GOGC=off disables the garbage collector entirely. runtime/debug.SetGCPercent allows changing this percentage at run time.

The GOMEMLIMIT variable sets a soft memory limit for the runtime. This memory limit includes the Go heap and all other memory managed by the runtime, and excludes external memory sources such as mappings of the binary itself, memory managed in other languages, and memory held by the operating system on behalf of the Go program. GOMEMLIMIT is a numeric value in bytes with an optional unit suffix. The supported suffixes include B, KiB, MiB, GiB, and TiB. These suffixes represent quantities of bytes as defined by the IEC 80000-13 standard. That is, they are based on powers of two: KiB means 2^10 bytes, MiB means 2^20 bytes, and so on. The default setting is math.MaxInt64, which effectively disables the memory limit. runtime/debug.SetMemoryLimit allows changing this limit at run time.

好像还是不太行。会跑飞,云上机器4c8g。
GOGC=50 GOMEMLIMIT=5368709120 nohup ./linux_server &
还是跑飞了...

xtaci commented

不用nohup呢,nohup我不确定会不会继承环境变量。你可以用tmux

xtaci commented

GOGC再小点,GOGC=10

xtaci commented

我自用的1c2g都没事

直接 GOGC=10 GOMEMLIMIT=4368709120 ./linux_server也崩了。。。 估计是我请求太频繁、流量太多导致的?
nohup是可以继承的

xtaci commented

cpu跑满了么

cpu 持续 200% +

xtaci commented

那就是某种原因导致了cpu全部消耗了,并且没有多余的时间用于GC,要么是配置错误,要么就是持续超高流量,那确实就是需要这么多内存和CPU

配置错误是指?

xtaci commented

例如interval, fec, fastack这类通信参数

好像都是没设置,和kcp相关的,就只设置了下面这些:

			conn, err := uln.AcceptKCP()
			if err != nil {
				log.Error().Msg(err.Error())
				continue
			}
			conn.SetStreamMode(true)
			conn.SetWriteDelay(false)
			conn.SetNoDelay(1, 10, 2, 1)
			conn.SetWindowSize(128, 128)
			conn.SetACKNoDelay(true)

xtaci commented

conn.SetACKNoDelay(true) 这个不需要设置

下午刚对着你的 kcptun 加上的 哈哈

除了这个选项,直接改sess.go的话,有什么建议么?

xtaci commented

没有,主要是找到cpu满载的原因

估计是交互量太大导致的。
我改了下面俩,然后,似乎现在稳定了。

			conn.SetNoDelay(0, 20, 0, 0)
			conn.SetWindowSize(256, 256)

大概知道了,可能是得在udp收数据的地方加一个判断,判断是会否kcp相关上数据,不然每个包都进去,量级太多的时候,就释放不过来了。
结单,感谢大佬