rollbar/pyrollbar

Limits on locals() logging do not prevent runaway traversal

nicku33 opened this issue · 5 comments

Currently the values defined in DEFAULT_LOCALS_SIZES
don't prevent traversal of large collections. The Shortener only applies after the whole object graph has been copied, as far as I can see. In our case, data transformation, it is not uncommon to have massive lists and dictionaries with 100k+ objects.

In traverse.py

return list_handler(list(traverse(elem, key=key + (i,), **kw) for i, elem in enumerate(obj)), key=key)

traverses and creates a copy of the collection. This can itself be very time consuming as well as memory intensive. Neither things one wants in a logging library. Memory use in particular should be practically bounded.

In addition, there is no limit to the amount of recursion except for circular reference checks. So for example a highly imbalanced tree implemented in dict() that has degraded to a linked list will be traversed.

I'd like to apply the iteration limits defined in configuration to the generators above. Maybe making a def limited_enumerate(iterable, stop_limit) to plug into enumerate.

As well I'd like to add a recursion count that respects an optional limit.

Thoughts ? Let me know if I've erred as this is all based on inspection. I haven't written a test.

If I understand correctly, the idea is to apply the limits from the locals config dictionary during capture of the locals objects rather than later.

I'd like to add a recursion count that respects an optional limit

I thought maxLevel might be intended for this, but I don't see yet where it is applied. Maybe a maxDepth option is what is needed here?

(Looking at the options defined here:

DEFAULT_LOCALS_SIZES = {
'maxlevel': 5,
'maxdict': 10,
'maxlist': 10,
'maxtuple': 10,
'maxset': 10,
'maxfrozenset': 10,
'maxdeque': 10,
'maxarray': 10,
'maxstring': 100,
'maxlong': 40,
'maxother': 100,
}
)

during capture of the locals objects rather than later

Yes. The limits seem currently used to censor objects going out to logs but the list(traverse... does an unbounded materialization of the generator output before that, if I read the code correctly. This resulted in hangs of 10 min for me with large data objects. I'd prefer a logging utility like pyrollbar have bounded resource usage, since it can be invoked in situations where a system is already stressed.

I also couldn't see where maxlevel was applied either, though I have to refamiliarize myself with the issue. I think that must have been the intention. maxdepth does seem like a more intuitive name for it.

So questions for you:
a) Do the other rollbar clients have any naming standards for these limit configs we should adhere to ?
b) If not, use maxlevel if not currently used or change to maxdepth ?
c) Do you generally agree with the suggestions ?

Any update on this?

Thanks for the nudge.

@nicku33 There are not other naming standards to follow. I prefer maxdepth, assuming maxlevel isn't already being used for this. (When I reviewed I didn't see it being used for anything.) Overall I agree with the motivation and the plan.