go-python/cpy3

Memory Leaks while running Concurrent Go Routines with a Python File.

jshiwam opened this issue · 13 comments

Describe what happened:
Hi Team,

I ran a simple go program, with a python file which only contains a simple hello world program, when I run the program without go routines, there are no memory leaks, but when I use go routines I see random spikes in memory, I will add the logs which I use to track the memory below.

Describe what you expected:
Logs without GoRoutine:
First Run
Second Run
Third Run
The above logs are produced by the same program, and the memory is consistent across all of them.

Logs with GoRoutine
First Run
Second Run
Third Run

The above logs are produced by the same program, for some reason the memory leak is random.

Steps to reproduce the issue:
Go ENV:

GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/u1/shijaisw/.cache/go-build"
GOENV="/u1/shijaisw/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/u1/shijaisw/go/pkg/mod"
GOPATH="/u1/shijaisw/go"
GOROOT="/u1/tags/dev/devtools/golang/1.16.15"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/u1/tags/dev/devtools/golang/1.16.15/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.15"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/u1/shijaisw/DQE/memolk/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1853794904=/tmp/go-build -gno-record-gcc-switches"

Python and Go Code that is Used to reproduce the memory leak

Put all the files in the same directory for simplicity

main.go
foo.py
bar.py

Commands used

  1. go mod tidy

If this doesnot get you the packages then
go get <github_repo>@master
example: go get github.com/go-python/cpy3@master

  1. go build
  2. Run the binary and see the logs

Please let me know why is such random memory leak happening at the first place, and how can that be resolved.

Hi Team,

I tried using valgrind to see how the memory leak looks like at the C level. I ran it for a serial go program and a concurrent one. Here are the logs:

Log of Single threaded go program with datadog
Log of Multi threaded go program with datadog

If you compare these logs you will see that at the end the Heap Summary for both of them is really close. And the memory leak at the C level is just around 0.5 MB appproxmately. But according to what I had mentioned before the memory leak is random when I run it with go routines

sorry for the late response. I haven’t read all the information you’ve provided but just to make sure there’s no misunderstanding: This project is just a relatively thin wrapper around Python's C API. So basically all the limitations of the Python C API apply. One limitation is that Python has a global interpreter lock (GIL). You can't do multiple things in a Python interpreter at the same time (unless you use Python's concurrency features in the Python code that you're executing).

The easiest way to make embedded Python to work with goroutines is to make sure that only one goroutine is actually "doing" Python at a time and all the other ones have to wait. Here's an excellent blog post and related conference talk that cover this and a few other options.

Does this answer your question?

apologies for my previous answer, I just properly read your issue and can see you're already aware of all that.
I still haven't had a look at your code (writing this from my phone), but here's an even simpler example I could find in my private gists.

It's a constant loop of two python "hello world" goroutines and appears to have no obvious memory leak. Note that I'm doing a runtime.GC() and debug.FreeOSMemory() at the end of every iteration to dial down Go's opinionated memory management (which sometimes can look like a memory leak). I wouldn't do that in production code though as it has a cost.

Below is a plot of the memory usage over a few minutes (which looks pretty constant). I'm not sure on which OS and Go version this was. (I believe it was on Windows?)

Maybe this helps you? (I will respond again once I had the chance to look at your code).

package main

import (
	"runtime"
	"runtime/debug"
	"sync"

	python3 "github.com/go-python/cpy3"
)

func main() {

	python3.Py_Initialize()
	var wg sync.WaitGroup

	for true {
		wg.Add(2)
		state := python3.PyEval_SaveThread()

		go func() {
			runtime.LockOSThread()
			_gstate := python3.PyGILState_Ensure()
			python3.PyRun_SimpleString("print('hello from python in goroutine 0')")
			python3.PyGILState_Release(_gstate)
			wg.Done()
		}()

		go func() {

			runtime.LockOSThread()
			_gstate := python3.PyGILState_Ensure()
			python3.PyRun_SimpleString("print('hello from python in goroutine 1')")
			python3.PyGILState_Release(_gstate)
			wg.Done()
		}()

		wg.Wait()
		python3.PyEval_RestoreThread(state)
		runtime.GC()
		debug.FreeOSMemory()
	}

	python3.Py_Finalize()

}

image

another thing I notice (haven't run your code yet, so not sure about the impact):
You're using a chan bool as stop channel. I don't think that's recommended. My (limited) understanding is that a bool needs 8 bytes or so in RAM and when using it in a channel the compiler can't further optimize. Afaik it's recommended to either use the context package or a chan struct{}. An empty anonymous struct takes zero memory.

Hi @christian-korneck I just tried running the code which you have send. Here is the memory log as I ran it, and it shows approx 300MB memory pile up after the start.

log

Here are the os details at my end:
`
NAME="Red Hat Enterprise Linux Server"
VERSION="7.9 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.9 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
`

Also we use this os lib for memory profiling:

procfs

It would be really helpful if you run the same code that I have and see the results at your end, I changed the bool chan to empty struct chan, since i was only using it for signalling. Even that didn't show any improvements in the memory

I ran the code for some time and after a while it's memory becomes constant, so we cannot consider it as a memory leak. I am not sure whether importing a python file and running it is causing the issue then?

yeah, we have to be careful to not mistake "expected" Go runtime behaviour for memory leaks. Even with runtime.GC() and debug.FreeOSMemory() there are automagical things going on. Go isn't C. (Same for "expected" Python runtime behaviour, but the Go side seems to cause more surprises). There are also other "variables" involved in your and mine example (i.e. a fd handle is needed to print to the shell, which might or might not have an impact).

Also just from looking at your example (I might well be mistaken), could it be that the problem is that you're starting new goroutines at a much faster rate than finishing a goroutine takes? As the program doesn't wait for the goroutines to finish, you might be constantly adding more Goroutines up until the end and each of them uses memory while doing things. What happens if you wait for all Goroutines to finish at the end of the program (after the limit for starting new goroutines is reached)? (i.e. with sync.WaitGroup). Will the memory usage drop once all goroutines are finished?

could it be that the problem is that you're spawning new goroutines at a much faster rate than finishing a goroutine takes

ok, I think that assumption at least isn't entirely incorrect. When I strip out all Python related parts in your code, use Go instead to print the "hello world" from the goroutines and increase the limit/number of goroutines to spawn from 100 to 1M (for i := 1; i < 1000000; i++ {) the program runs for a while but at some point I get panic: too many concurrent operations on a single file or socket (max 1048575) related to file descriptor (= too many goroutines try to print to the terminal at the same time). We're just starting as many goroutines as fast as possible, but they don't have a chance to finish equally fast. We have a constantly growing number of goroutines. I'm also at 10K OS threads before it panics.

I can't (immediatly) run your benchmarks as I'm on the wrong OS. What happens if you change the code to wait for the goroutines to finish? Will memory usage go down as expected?

As the program doesn't wait for the goroutines to finish, you might be constantly adding more Goroutines up until the end and each of them uses memory while doing things. What happens if you wait for all Goroutines to finish at the end of the program (after the limit for spawning new goroutines is reached)? (i.e. with sync.WaitGroup). Will the memory usage drop once all goroutines are finished?

Hi @christian-korneck, I think you are correct on this one, the "memory leak" increases when I spawn off more threads, it might just be that the threads are using memory. When I use WaitGroup instead of channels, I see that memory increases initially but after some point in time it reaches a "constant number", and this "constant number" is directly proportional to the number of threads that I spawn.

Here is the code that I wrote to test it.
code

if no==50 then the memory reaches constant state after approx 3GB of memory
if no==10 then the memory reaches constant state after approx 2GB of memory

I have been trying these things in the mock project I created to test the memory leak issue. Let me make changes in the production code and get back to you on this.

Your insights are highly appreciated, thanks a lot.

if no==50 then the memory reaches constant state after approx 3GB of memory
if no==10 then the memory reaches constant state after approx 2GB of memory

Are you sure that's Gigabytes? That sounds WAY too much for your hello world stuff. It should be in the area of Megabytes. At least ~5MB (that's just an initialized Python) and with all the Goroutines probably a multiple of a low factor of that? But it definitely shouldn't be thousands of megabytes.

Are you sure that's Gigabytes?

yes it is, you can check the logs here
log

and this is the code that gives us memory usage.
`

func MemUsage(id int) (uint, int, int) {
p, err := procfs.Self()
if err != nil {
log.Fatalf("could not get process: %s", err)
}

stat, err := p.Stat()
if err != nil {
	log.Fatalf("could not get process stat: %s", err)
}
vmmGB := stat.VirtualMemory() / (1024 * 1024 * 1024)
vmmMB := stat.VirtualMemory() / (1024 * 1024)
rssMB := stat.ResidentMemory() / (1024 * 1024)

var vmDelta uint
var rmDelta int
if VmSizeOld != 0 && RmSizeOld != 0 {
	vmDelta = vmmMB - VmSizeOld
	rmDelta = rssMB - RmSizeOld
}
VmSizeOld = vmmMB
RmSizeOld = rssMB
fmt.Printf("MemUsage: id:%d vmsize:%d GB %d MB\trss:%d MB | MemDelta: vmdelta:%d MB\trmdelta:%d MB\n", id, vmmGB, vmmMB, rssMB, vmDelta, rmDelta)
return vmmMB, rssMB, rmDelta

}

`

ah I see, you’re looking at vmsize, but rss in your log is actually just 6 MB and 9 MB for the two examples. I’d argue rss is more appropriate here (it’s roughly what top and Windows task manager are displaying afaik and also what my plots are showing).

(I believe it’s not abnormal that for Go programs vmsize is much higher then rss, though I don’t understand it in details).

So all good.