matplotlib/matplotlib

Closed figures linger in memory

cknd opened this issue ยท 28 comments

cknd commented

tl;dr: When I repeatedly create a large figure, save it, and close it, memory usage keeps growing.

Over at this discussion about when MPL should trigger garbage collection, @efiring had some lingering doubts about the chosen solution:

It would certainly be good to have a clearer understanding of when, if ever in practice, it would lead to troublesome increases in memory consumption

I ran into such a case today, when my batch job filled up 60G of RAM over night.

I repeatedly create a large figure, save it, then close it. If I don't manually call gc.collect() after closing each figure, memory consumption saturates at around 10x of what an individual figure needs. In my case, with several fairly complex figures, this was enough to fill a big machine.

Since this is not obvious from the docs, I think there should be an official way to go back to more aggressive GC for cases like this where the trade-off discussed at #3045 fails. Maybe close(force_gc=True)?

Code for reproduction

image

from memory_profiler import profile # https://pypi.python.org/pypi/memory_profiler
from memory_profiler import memory_usage

import matplotlib.pyplot as plt
import numpy as np
import gc

N = 80

@profile
def do_plots():
    fig = plt.figure()
    plt.plot(np.random.rand(50000))
    plt.savefig('/tmp/bla.png')
    plt.close(fig)

def default():
    for k in range(N):
        print(k)
        do_plots()

def manual_gc():
    for k in range(N):
        print(k)
        do_plots()
        gc.collect()


mem_manual_gc = memory_usage((manual_gc, [], {}))
mem_default = memory_usage((default, [], {}))


plt.plot(mem_manual_gc, label='gc.collect() after close')
plt.plot(mem_default, label='default behaviour')
plt.ylabel('MB')
plt.xlabel('time (in s * 0.1)')  # `memory_usage` logs every 100ms
plt.legend()
plt.title('memory usage')
plt.show()

Matplotlib version

  • Operating System: Ubuntu 16.10
  • Matplotlib Version: 2.0.0
  • Python Version: 3.5.2

The underlying issue is that there are circular references between the figure and the axes.

We could make some of those weak refs (and patch over it with properties), but that would be a pretty big effort to find all of the references back and forth. However, if we did this we would put a bunch of extra work on the user to make sure stuff did not get erroneously garbage collected.

I am not sure that adding a kwarg to close is any clearer to users than having to do the gc them selves.

cknd commented

Right, that does sound like a tangled ball of wool.

I am not sure that adding a kwarg to close is any clearer to users than having to do the gc them selves.

I appreaciate the messiness of a kwarg (or rcparam) like that. On the other hand, it would have saved me an afternoon of poking around with memory_profile if there was something that signaled "here be dragons, and here's the one official way to hack around them". Should I write a paragraph for the docs?

A paragraph for the docs would be appreciated. I am not sure where the best place to put it is though. Maybe in the faq section?

Maybe this wouldn't help, but: Suppose that when closing a figure we walked through the hierarchy of children, and for each child artist, set its figure reference to None. Would that break the circles?

This proposal (which I agree with) is related to #6793 and #6982.

What is the best practice, if you are looping and generating a large number of images with matplotlib, what is the solution? Is there a way to just force matplotlib to unload any memory that it has? So far, the only way I can do this is put each call to matplotlib in a separate process, but I would like to think I could generate multiple plots in a single process (or notebook). I've tried various combos of .clf() .close(figure) and python del's but it still leaks and eventually crashes.

cknd commented

Is there a way to just force matplotlib to unload any memory that it has?

From what I found back then, it helps to manually trigger garbage collection after closing each figure, with gc.collect() (see the code snippet above). That should find & remove the circular object graph associated with the closed figure.

I am forcing a garbage collect, but no luck. I still run out of memory and crash. So far the only workarounds I have are:

  • Each figure in its own process (this works the best, but is the most complex)
  • Reuse figures as much as possible (just pass them around and update subplots and other items inside of already allocated objects. (this also works well, there are still leaks, but I can usually keep the ship afloat long enough to finish)

so closing a figure doesn't remove references to it in the pyplot state manager?

Turn the interactive mode off before running any of your loops. It did the trick for me.
plt.ioff()

What happens if you do not include a legend in your plots? See issue #19345.

What worked for me was to use fig.savefig() instead of plt.savefig(). And then close the figure with plt.close().
No need for gc.collect() or anything else.

@analkumar2 OP says they closed it and it still didn't work

@adeeb10abbas The issue is still open. With python 3.6.9, matplotlib 3.3.4 , Ubuntu 20.04, if you run the following code:

import matplotlib.pyplot as plt
import numpy as np

for i in range(100000):
	fig,axs = plt.subplots(1,1,figsize=(19.20,10.80))
	axs.plot([1,2,3])
	plt.savefig(f'temp.png')
	# fig.savefig(f'temp.png')
	plt.close('all')

You will run out of RAM very soon. You've to use fig.savefig() instead of plt.savefig() to avoid the memory leak.
And what's 'OP'?

@analkumar2 I just tested with python 3.8, matplotlib 3.3.4, Ubuntu 20.04 and was not able to reproduce your issue. There was no memory leak. Can you update your python version and see if it works?
OP means original poster/author of the post.

I'm going to close this, because I don't think every memory leak issue should be put in the same issue. #8519 (comment) is quite different from he original post, and I think it is confusing to conflate them.

Feel free to re-ope if the original post is still leaking. However, I couldn't make that run on my machine, so I'm not sure how relevant it is 4 years later.

If you are doing batch work setting

matplotlib.use('agg')

may also help. With some version of Qt if you never spin the event loop we end up with many windows which are "closed", but still exist and are waiting for the Qt main loop to spin so they can finish deleting themselves!

@adeeb10abbas The issue is still open. With python 3.6.9, matplotlib 3.3.4 , Ubuntu 20.04, if you run the following code:

import matplotlib.pyplot as plt
import numpy as np

for i in range(100000):
	fig,axs = plt.subplots(1,1,figsize=(19.20,10.80))
	axs.plot([1,2,3])
	plt.savefig(f'temp.png')
	# fig.savefig(f'temp.png')
	plt.close('all')

You will run out of RAM very soon. You've to use fig.savefig() instead of plt.savefig() to avoid the memory leak.
And what's 'OP'?

Sorry. Even I cannot reproduce this now. I have not updated my system or any of the packages. I should have saved the memory profiler output when I was having the issue.

I am facing a similar issue on Ubuntu 18.04, python 3.8.12, matplotlib 3.4.2. The program consumes all RAM and SWAP and then crashes. any ideas? Thanks in advance. Indicative code is below:

for j in df.id.unique():
       ....
       ....
	for plot_times in ['hour']:#['dow', 'hour', 'month', 'day']:
		fig, ax = plt.subplots(3, 1, figsize=(30,8))
		cmap = get_cmap(len(temp[plot_times].unique()))
		col_list = [cmap(i) for i in list(temp[plot_times])]
		for i, attr in enumerate(['a', 'b', 'c']):
			scatter_x = np.array(epoch_seconds)
			scatter_y = np.array(temp[attr])
			group = np.array(temp[plot_times])
			for g in np.sort(np.unique(group)):
				ix = np.where(group == g)
				if plot_times=='hour':
					ax[i].scatter(scatter_x[ix], scatter_y[ix], s=2, color=cmap(g), label = labels[plot_times][g])
				else:
					ax[i].scatter(scatter_x[ix], scatter_y[ix], s=2, color=cmap(g), label = labels[plot_times][g-1])
		ax[1].legend(mode = "expand", ncol = len(labels[plot_times]), prop={'size': 14})
		ax[0].set_ylabel("F")
		ax[1].set_ylabel("O")
		ax[2].set_ylabel("S")
		plt.xlabel("T")
		plt.tight_layout()
		# plt.show()
		fig.savefig("../images/str(j)+".png", dpi=300)
		fig.clf()
		plt.close(fig)
		plt.close('all')

Please open a new issue with a self-contained minimal reproducible example.

@vishalmhjn did you try it with plt.ioff()?

@adeeb10abbas The issue is still open. With python 3.6.9, matplotlib 3.3.4 , Ubuntu 20.04, if you run the following code:

import matplotlib.pyplot as plt
import numpy as np

for i in range(100000):
	fig,axs = plt.subplots(1,1,figsize=(19.20,10.80))
	axs.plot([1,2,3])
	plt.savefig(f'temp.png')
	# fig.savefig(f'temp.png')
	plt.close('all')

You will run out of RAM very soon. You've to use fig.savefig() instead of plt.savefig() to avoid the memory leak.
And what's 'OP'?

Sorry. Even I cannot reproduce this now. I have not updated my system or any of the packages. I should have saved the memory profiler output when I was having the issue.

I can reproduce the issue with python 3.7.9 and matplotlib 3.5.1 on Windows 10 version 20H2. Observing the task manager, you can clearly see memory building and after nearly 1-1.5 GB of build up the following error shows "Fail to create pixmap with Tk_GetPixmap in TkImgPhotoInstanceSetSize"

It's interesting. I never faced this issue on my old laptop but was surprised when I encountered it on my new laptop. So I had to go back and check the matplotlib version on old laptop which was 3.3.4. So I tried to reproduce the above error on windows 10 20H2, python 3.7.9 and matplotlib 3.3.4 on my new laptop and surprisingly I wasn't able to. Looks like there is some issue with the latest release of matplotlib, I suggest using matplotlib version 3.3.4 if anyone is facing this issue. Hoping this helps in other issues as well.

@Prajval-1608 can you please open a new issue with all the relevant details? Perhaps most pertinent would be to also include what backend you are using....

I'm actually going to lock this to stop the me-too comments on a five-year-old issue. If you think you have a memory leak with a recent matplotlib version that is reproducible, please fill out a new issue with all the relevant information requested. Thanks!