bblfsh/python-client

[memory leakage?] bblfsh.filter consumes more and more memory

EgorBu opened this issue · 5 comments

Hi,
I noticed that bblfsh.filter causes high memory consumption.
Here it's script to reproduce

from collections import defaultdict, deque

import bblfsh


IDENTIFIER = bblfsh.role_id("IDENTIFIER")
QUALIFIED = bblfsh.role_id("QUALIFIED")
LITERAL = bblfsh.role_id("LITERAL")


def uast2sequence(root):
    sequence = []
    nodes = defaultdict(deque)
    stack = [root]
    nodes[id(root)].extend(root.children)
    while stack:
        if nodes[id(stack[-1])]:
            child = nodes[id(stack[-1])].popleft()
            nodes[id(child)].extend(child.children)
            stack.append(child)
        else:
            sequence.append(stack.pop())
    return sequence


def filter_bblfsh(n_times=1000,
                  py_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/python/pyspark/cloudpickle.py",
                  java_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/JavaHdfsLR.java"):
    import bblfsh

    client = bblfsh.BblfshClient("0.0.0.0:9432")
    py_uast = client.parse(py_path).uast
    java_uast = client.parse(java_path).uast
    XPATH = "//*[@roleIdentifier and not(@roleQualified)]"

    for i in range(n_times):
        bblfsh.filter(py_uast, XPATH)
        bblfsh.filter(java_uast, XPATH)


def filter_alternative(n_times=1000,
                  py_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/python/pyspark/cloudpickle.py",
                  java_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/JavaHdfsLR.java"):
    import bblfsh

    client = bblfsh.BblfshClient("0.0.0.0:9432")
    py_uast = client.parse(py_path).uast
    java_uast = client.parse(java_path).uast

    for i in range(n_times):
        list(filter(lambda node: IDENTIFIER in node.roles and QUALIFIED not in node.roles,
               uast2sequence(py_uast)))
        list(filter(lambda node: IDENTIFIER in node.roles and QUALIFIED not in node.roles,
               uast2sequence(java_uast)))


if __name__ == "__main__":
    import sys
    if int(sys.argv[1]) == 0:
        print("bblfsh")
        filter_bblfsh(n_times=int(sys.argv[2]))
    else:
        print("bblfsh-free")
        filter_alternative(n_times=int(sys.argv[2]))

and some measurements (surprisingly python code ~20 times faster than from bblfsh-client):

egor@egor-sourced:~/workspace$ /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 0 100; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 0 200; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 0 400
bblfsh
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 0 100"
	User time (seconds): 13.19
	System time (seconds): 0.09
	Percent of CPU this job got: 97%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.61
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 119948
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 38584
	Voluntary context switches: 606
	Involuntary context switches: 49
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
bblfsh
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 0 200"
	User time (seconds): 26.68
	System time (seconds): 0.15
	Percent of CPU this job got: 98%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.19
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 188672
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 63460
	Voluntary context switches: 1146
	Involuntary context switches: 115
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
bblfsh
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 0 400"
	User time (seconds): 54.72
	System time (seconds): 0.22
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:55.22
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 326392
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 113651
	Voluntary context switches: 2382
	Involuntary context switches: 164
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
egor@egor-sourced:~/workspace$ /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 1 100; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 1 200; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 1 400
bblfsh-free
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 1 100"
	User time (seconds): 0.86
	System time (seconds): 0.03
	Percent of CPU this job got: 70%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.27
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 37548
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 12861
	Voluntary context switches: 103
	Involuntary context switches: 7
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
bblfsh-free
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 1 200"
	User time (seconds): 1.50
	System time (seconds): 0.01
	Percent of CPU this job got: 80%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.88
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 37172
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 15766
	Voluntary context switches: 123
	Involuntary context switches: 25
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
bblfsh-free
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 1 400"
	User time (seconds): 2.69
	System time (seconds): 0.03
	Percent of CPU this job got: 87%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.11
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 37292
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 21603
	Voluntary context switches: 191
	Involuntary context switches: 38
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Thanks for reporting. RSS is usually not a good measure of real memory usage because page faults inside the same process (as will happen in this case with more iterations) can increase it without meaning that the process real memory usage is leaking. Fortunately Python has a resource module that can be used to check the real usage (we actually have a unittest using it to check memory increases), so I'll modify your script to measure the memory increase using it and report back.

I can confirm that it seems there is a leak that nonetheless doubles the memory usage every 10.500 calls to filter. Please note that I'm using other python and java files as sources, so results could vary with different files.

I modified the __main__ function of the script this way:

if __name__ == "__main__":
    import sys
    import resource

    before = resource.getrusage(resource.RUSAGE_SELF)
    if int(sys.argv[1]) == 0:
        print("bblfsh")
        filter_bblfsh(n_times=int(sys.argv[2]))
    else:
        print("bblfsh-free")
        filter_alternative(n_times=int(sys.argv[2]))
    after = resource.getrusage(resource.RUSAGE_SELF)
    print('Memory increased by: %d%%' % int(100 * ((after[2] / before[2]) - 1)))

The I ran several iterations and could check that there is indeed a increase in the % of used memory after the calls to filter trough funnily they are not lineal (sometimes it doubles while doubling the iterations, sometimes they just increases a little):

  • 0 (base consumption just by running the script without any filtering): 4%
  • 10: 4%
  • 100: 5%
  • 200: 6%
  • 400: 12%
  • 800: 14%
  • 1600: 24%
  • 3200: 31%
  • 6400: 61%
  • 12800: 114%.

There increase doesn't happen when using the bblfsh-free version, so the leak is definitely somewhere in the call stack initiated by filter().

More info: I tried with huge source code files and the leak increased a lot with less iterations, so it looks like some part of the UAST (or nodes?) is not being freed after usage.

In the PR above I fixed all runaway memleaks, or it seems so. Meaning that there is an initial, fixed-size leak when calling filter for the first time, but after a few iterations it stabilizes and the memory usage doesn't increase (I tested with 20.000 iterations with some huge source code files).

I still haven't found the cause of the initial fixed-size leak, I guess I'll have to recompile a patched Python to be able to use Valgrind (the horror) so it will take some time.

After carefully monitoring the memory usage on each step, the initial "leak" is not such but the initialization of libxml2's parsing engine. After all it increase a few bytes every 4-5 iterations, probably because of caching of libxml2 or Python, and stabilized at around 100 iterations without increasing (this time I tested with 100.000 iterations when I left the office).

So this can be considered fixed once the PR is merged.