mozilla/moz-sql-parser

Memory leak related to accumulation of ParseExceptions in `all_exceptions`

Closed this issue · 3 comments

erp12 commented

I have been tracking down memory issues in my application and stumbled upon a possible issue with the DEBUG option for the parser.

Brief background on my application, it is an evolutionary algorithm that is essentially one big loop. There should be almost no accumulation of data between each iteration of the loop and GC should be able to clean up everything that the previous iteration was manipulating. Below are some counts for different objections I see each iteration of the loop. Clearly some types of objects are sticking around between iterations.

object_growth

(Horizontal axis is loop iteration)

Definitely a lot of pyparsing ParseExceptions, dicts, and tuples being kept around. I used the objgraph package to check what modules were referencing these objects. I checked 10 random instances per loop iteration, and they all pointed to the moz_sql_parser module via lists within the all_exceptions dictionary.

Some example reference graphs:

ParseException

ParseException_reference_graph

Dicts

dict_reference_graph

Tuple

tuple_reference_graph

I then started reading the source code surrounding all_exceptions. I have never used PyParsing directly so I might be wrong, but it looks as though the logic for DEBUG is inverted. As far as I can tell, if DEBUG=False (hardcoded default) then the record_exception function will trigger. If DEBUG=True then record_exception will not be used.

https://github.com/mozilla/moz-sql-parser/blob/dev/moz_sql_parser/sql_parser.py#L39-L42

DEBUG = False

# ...

if DEBUG:
    debug = (None, None, None)
else:
    debug = (nothing, nothing, record_exception)

Shouldn't this be inverted? I only expect to record exceptions if debug is true. Maybe I am missing some details of pyparsing. This logic is the same for master and dev.

erp12 commented

I may have open this issue prematurely... I have been doing some experimentation no a local clone and I have a better understanding of what is going on.

I tried enabling DEBUG on a local clone and noticed that pyparsing will print all exceptions to stdout. I assume that is why the the pyparsing default is overwritten when DEBUG is false.

I tried the following with DEBUG = False:

if DEBUG:
    debug = (None, None, None)
else:
    debug = (nothing, nothing, nothing)

I got a test failure, and I know that it would break this however it fixed my memory issue:

no_record_exception

Dicts and tuples stopped growing. ParseException didn't even register as a common type. The overall memory footprint of my application fell from roughly 1GB per loop iteration to holding steady at 60MB.

It's possible that my use case is unique. I am working on a programming-by-example AI project that synthesizes sql queries. It requires parsing thousands (maybe millions) of generated sql queries, and thus the cost of accumulating the ParseExceptions is high.

I can understand if this is not something you want to officially address in the project, however it would be nice to somehow make record_exception optional.

Excellent find! Thank you!

Fixed here: 9e97eae?w=1

thank you very much!