TheRealMichaelWang/NoHoPython

Enabling Bounds Checking causes Memory Leaks

TheRealMichaelWang opened this issue · 4 comments

After bounds checking was enabled by default, running mem-test.nhp would produce the following:

NHP Memory Analysis Report
Active Memory Allocations: 1
Peak Memory Allocations: 1
Index out of bounds, File "C:\Users\Micha\source\repos\NoHoPython\NoHoPython\bin\Debug\net6.0\tests\mem-test.nhp", row 11, col 8. Index was 3, alloced length was -2144127272.
        self.cells[3]
      0 [main] a 111 cygwin_exception::open_stackdumpfile: Dumping stack trace to a.exe.stackdump

The length of -2144127272 means self.cells was corrupted.

Compiling mem-test.nhp while omitting bounds checking,

NoHoPython tests/mem-test.nhp out.c -leaksan -nobounds

Doesn't crash. No corruption occurs In addition, I've verified by stepping through out.c in gdb and carefully inspecting the value of self.cells.

After further analysis, it has been determined that bounds checking is not the cause of the bug. Consider the following:

class cell:
	fn<None> myFn
	array<cell> cells
	def __init__():
		self.cells = [<cell>]
		self.myFn = self.stuff
	def stuff():
		self.cells = new cell[10](new cell())
		self.cells[3].myFn = self.mychild
	def mychild():
		pass

When stuff is called, and the line self.cells[3].myFn = self.mychild is executed:

  • the value of self.cells[3].myFn is buffered in a temporary location (shallow copy)
    • self.cells[3].myFn was set to self.cells[3].stuff before this line was executed
  • self.mychild is copied
    • self.mychild captures self, but because it has the same responsible destroyer as self, self's reference count doesn't increase.
  • self.cells[3].myFn is set to the copied result
  • The previously buffered shallow copy of self.cells[3].myFn is released and destroyed.
    • the old value of self.cells[3].myFn (self.cells[3].stuff) begins to destroy all values that it has captured
      • self.cells[3] is destroyed because it's captured by self.cells[3].myFn, and it's reference count is 0
        • self.cells[3] destroys all it's properties, including self.cells[3].myFn, which is set to self.mychild.
          • self.mychild destroys self, because it captures self, and self's reference count is 0.
            • Ultimatley all properties of self, are destroyed, hence the memory issue with self.cells having an outlandishly long length.

This issue can be potentially avoided if self.cells.myFn, or any other closure really, couldn't destroy a captured value if it is within the same object graph - if their respective responsible destroyers are equal.

However, responsible destroyers are always records/classes. The problem is, when a responsible destroyer - the record - destroys it's own children it shouldn't be stopped just because it and it's child share the same responsible destroyer.

It's only when the child destroys the parent - which is only possible when records destroy captured records/classes - that validating the responsible destroyer is actually useful for preventing this kind of situation.

In essence, the proposed fix is:

  • Each anonymous procedure maintains it's own independent responsible destroyer tag
    • The anonymous destroyer would pass it's own responsible destroyer as a child agent
    • Other types, such as records/classes and arrays would simply pass NULL as a child agent
  • Every time a record/class (or any other type) is freed, a child agent, is required.
    • If the record/class's responsible destroyer is equal to the child agent, it returns and doesn't free
    • Otherwise proceed with freeing.

In addition, these proposed changes would eliminate the need to maintain locks for procedures, and records/classes.

The issue, at least in mem-test.nhp, seems to be resolved. However, this is not the first time a memory-safety fix was ultimately determined to be unsatisfactory; after almost every fix, other memory safety bugs were found.

Because memory-safety is such a core tenet of North-Hollywood Python, further testing should be conducted to ensure the fix actually works.

Just tested the fix, with

  • linked-list.nhp
  • csv-test.nhp
  • list-test.nhp

And everything seems to work.