libgit2/pygit2

Remote objects are never destroyed, which means Repository is never destroyed

mduggan opened this issue · 5 comments

This is a similar issue to #321 #326 #328, but the cause is different

When I do this:

import pygit2, sys
repo = pygit2.Repository('libgit2')
print "refcount before: %d" % sys.getrefcount(repo)
for i in range(1,100):
    len(repo.remotes)

print "refcount after: %d" % sys.getrefcount(repo)

I get this:

~/src$ python testgit.py 
refcount before: 2
refcount after: 101

The root cause is the self._self_handle in the Remote object. Because it's an opaque reference through ffi, it seems that the python garbage collector can't detect the circular reference of the Remote to itself. That means it never gets cleaned up, and it leaks a reference to the repo in self._repo, so that never gets cleaned up either.

A side-note.. the memory leak needs fixing either way, but does the Remote actually need the _repo member? It's not referenced anywhere..

A git_remote needs its git_repository to outlive it. The only way to express this in python is via this reference to the Repository which spawned it.

The fix looks like it would be to keep the callback payload as None until we perform a fetch and clear it out afterwards.

Ah yes, having the reference makes sense.

Since a few functions need the callbacks and there are various exceptions etc, maybe have a decorator like @needs_callbacks that can do the setup/cleanup of the payload? Unless that sounds like a terrible idea I'll try and do a PR with something like that.

Those callbacks are only for fetch, so we shouldn't need anything so complicated. The only thing the reference really needs to outlive is the C.git_remote_fetch() call, which is where it's going to be used to call the callbacks.

There's a way to retrieve the set callbacks, but the simplest thing to do is probably to move the callback setting to the beginning of fetch() and then unset the callbacks structure and unreference the handle.