getsentry/symbolic

memory usage when parsing sym files

willkg opened this issue · 4 comments

Environment

  • symbolic 10.1.1
  • Python 3.9.7

Steps to Reproduce

Mozilla has a symbolication service (https://symbolication.services.mozilla.com/). Since the introduction of inline function data in sym files on September 8th, 2022, we've been getting out-of-memory errors when parsing sym files using symbolic.

I reduced it down to a small script (attached) which parses sym files in a directory, generates symcache files, and then throws them away. The memory usage monotonically goes up suggesting maybe a memory leak in symbolic or I'm using symbolic horribly wrong.

parse_cli.py.txt

Expected Result

Because the script uses symbolic to parse sym files, but then throws the resulting symcache files out, I would expect memory not to balloon to 600mb+ and keep increasing.

Actual Result

Example run of attached script:

$ python parse_cli.py syms
Memory: 17mb
Looking at 'syms'
working on syms/XUL/0AE6EE7E767E3A08B479A25625869AE40/XUL.sym (705,451,533) ...
Memory: 686mb
working on syms/xul.pdb/E02F88D165EF8D754C4C44205044422E1/xul.sym (207,544,887) ...
Memory: 686mb
working on syms/xul.pdb/11E80C753012E2444C4C44205044422E1/xul.sym (530,143,265) ...
Memory: 686mb
working on syms/xul.pdb/760A40087017DB0D4C4C44205044422E1/xul.sym (635,694,087) ...
Memory: 686mb
working on syms/libxul.so/69392F69A52F9AE8D79FCBF0C765389D0/libxul.so.sym (693,026,206) ...
Memory: 714mb
working on syms/libxul.so/F53783197E19F4D010E2FEE918021D060/libxul.so.sym (693,817,871) ...
Memory: 717mb

What are you running the script on in your example?

The Mozilla Symbolication server is running in the python:3.9.12-slim@sha256:0cdfeed99b35442a55c9fd3401267f395b8ed8319b605bb4b71ee8292aeceaea docker image.

I was running my script on Ubuntu 22.04 on an intel-based laptop. Linux 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

We've been experimenting with this for a while and can't definitively figure out where the memory is going. Our best guess is that python doesn't immediately give the memory back when the garbage collector frees objects. Note that if you e.g. run the script on a directory containing several copies of the same file, reported memory usage will stay approximately constant.

We've found, though, that while using jemalloc doesn't change the fundamental behavior, it reduces the total memory usage. We've also experienced this in production.

Ok. Thank you for looking into it!