mgedmin/objgraph

Why are new dictionaries not detected by show_growth()?

Closed this issue · 6 comments

I am trying basically:

>>> import objgraph as o
>>> o.show_growth()
...
>>> d = {1: 2}
>>> o.show_growth()
>>>

I have learned from the documentation that references to primitive types are not tracked, but I wonder why the reference to a dictionary (which is not an of primitive type?) is not counted? For example, if I define a list with only primitive element, it is tracked:

>>> l = [2]
>>> o.show_growth()
list      356        +1

but the same with a dictionary does not work?

I'm not entirely sure why, but

$ python
Python 2.7.15+ (default, Oct  2 2018, 22:12:08) 
[GCC 8.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gc
>>> gc.is_tracked([2])
True
>>> gc.is_tracked({1: 2})
False

(For some data types Python has the optimization where it decides whether to track a specific object or not depending on its contents. How it decides what to track or not is a great question, but not one I'm able to answer.)

Actually, I can guess: Python's GC needs to track objects if and only if they can participate in reference cycles.

Objects that have no references to other objects (such as ints or strings) are the most obvious example of things that can be untracked.

Objects such as dictionaries that refer only to untracked objects can also be untracked.

>>> d = {1: 2}
>>> d
{1: 2}
>>> gc.is_tracked(d)
False

If you modify the dictionary to add a reference to a tracked object, Python will flip the tracked bit at runtime:

>>> d[1] = 3
>>> gc.is_tracked(d)
False
>>> d[1] = [2]
>>> gc.is_tracked(d)
True

OK, thank you.
Supposing my program may have a list or another data structure that (not intentionally) holds references to not tracked dictionaries (or other not tracked types), this prevents these referenced objects from collection by GC. Is there a way to find such references and detect growth in number of these references/objects?

this prevents these referenced objects from collection by GC.

That is not precisely accurate. Python has two ways to collect garbage:

  • reference counting (the primary way)
  • a cyclic garbage collector

Pure reference counting has trouble with data structures that contain reference cycles, for which the cyclic garbage collector was added back in Python 2.0 or so. The data structure you've described has no cycles, so it'll be collected as soon as the last reference to it goes away and Python decrements the corresponding reference counter.

I understand it will be collected immediately. But is there a way to do a thing like "collect all new objects since the previous snapshot" (both tracked and not tracked by GC). I see it is now out of the scope of your library but may be you have any suggestions?

Are you looking for https://docs.python.org/3/library/gc.html#gc.collect? Because if not, then I've no idea what to tell you ;)