Memory leak related to accumulation of ParseExceptions in `all_exceptions`
Closed this issue · 3 comments
I have been tracking down memory issues in my application and stumbled upon a possible issue with the DEBUG
option for the parser.
Brief background on my application, it is an evolutionary algorithm that is essentially one big loop. There should be almost no accumulation of data between each iteration of the loop and GC should be able to clean up everything that the previous iteration was manipulating. Below are some counts for different objections I see each iteration of the loop. Clearly some types of objects are sticking around between iterations.
(Horizontal axis is loop iteration)
Definitely a lot of pyparsing ParseExceptions, dicts, and tuples being kept around. I used the objgraph
package to check what modules were referencing these objects. I checked 10 random instances per loop iteration, and they all pointed to the moz_sql_parser
module via lists within the all_exceptions
dictionary.
Some example reference graphs:
ParseException
Dicts
Tuple
I then started reading the source code surrounding all_exceptions
. I have never used PyParsing directly so I might be wrong, but it looks as though the logic for DEBUG
is inverted. As far as I can tell, if DEBUG=False
(hardcoded default) then the record_exception
function will trigger. If DEBUG=True
then record_exception
will not be used.
https://github.com/mozilla/moz-sql-parser/blob/dev/moz_sql_parser/sql_parser.py#L39-L42
DEBUG = False
# ...
if DEBUG:
debug = (None, None, None)
else:
debug = (nothing, nothing, record_exception)
Shouldn't this be inverted? I only expect to record exceptions if debug is true. Maybe I am missing some details of pyparsing. This logic is the same for master
and dev
.
I may have open this issue prematurely... I have been doing some experimentation no a local clone and I have a better understanding of what is going on.
I tried enabling DEBUG
on a local clone and noticed that pyparsing will print all exceptions to stdout. I assume that is why the the pyparsing default is overwritten when DEBUG
is false.
I tried the following with DEBUG = False
:
if DEBUG:
debug = (None, None, None)
else:
debug = (nothing, nothing, nothing)
I got a test failure, and I know that it would break this however it fixed my memory issue:
Dicts and tuples stopped growing. ParseException didn't even register as a common type. The overall memory footprint of my application fell from roughly 1GB per loop iteration to holding steady at 60MB.
It's possible that my use case is unique. I am working on a programming-by-example AI project that synthesizes sql queries. It requires parsing thousands (maybe millions) of generated sql queries, and thus the cost of accumulating the ParseExceptions is high.
I can understand if this is not something you want to officially address in the project, however it would be nice to somehow make record_exception
optional.
Excellent find! Thank you!
Fixed here: 9e97eae?w=1
thank you very much!