Enabling Bounds Checking causes Memory Leaks
TheRealMichaelWang opened this issue · 4 comments
After bounds checking was enabled by default, running mem-test.nhp
would produce the following:
NHP Memory Analysis Report
Active Memory Allocations: 1
Peak Memory Allocations: 1
Index out of bounds, File "C:\Users\Micha\source\repos\NoHoPython\NoHoPython\bin\Debug\net6.0\tests\mem-test.nhp", row 11, col 8. Index was 3, alloced length was -2144127272.
self.cells[3]
0 [main] a 111 cygwin_exception::open_stackdumpfile: Dumping stack trace to a.exe.stackdump
The length of -2144127272
means self.cells
was corrupted.
Compiling mem-test.nhp
while omitting bounds checking,
NoHoPython tests/mem-test.nhp out.c -leaksan -nobounds
Doesn't crash. No corruption occurs In addition, I've verified by stepping through out.c
in gdb and carefully inspecting the value of self.cells
.
After further analysis, it has been determined that bounds checking is not the cause of the bug. Consider the following:
class cell:
fn<None> myFn
array<cell> cells
def __init__():
self.cells = [<cell>]
self.myFn = self.stuff
def stuff():
self.cells = new cell[10](new cell())
self.cells[3].myFn = self.mychild
def mychild():
pass
When stuff
is called, and the line self.cells[3].myFn = self.mychild
is executed:
- the value of
self.cells[3].myFn
is buffered in a temporary location (shallow copy)self.cells[3].myFn
was set toself.cells[3].stuff
before this line was executed
self.mychild
is copiedself.mychild
capturesself
, but because it has the same responsible destroyer asself
,self
's reference count doesn't increase.
self.cells[3].myFn
is set to the copied result- The previously buffered shallow copy of
self.cells[3].myFn
is released and destroyed.- the old value of
self.cells[3].myFn
(self.cells[3].stuff
) begins to destroy all values that it has capturedself.cells[3]
is destroyed because it's captured byself.cells[3].myFn
, and it's reference count is0
self.cells[3]
destroys all it's properties, includingself.cells[3].myFn
, which is set toself.mychild
.self.mychild
destroysself
, because it capturesself
, andself
's reference count is0
.- Ultimatley all properties of
self
, are destroyed, hence the memory issue withself.cells
having an outlandishly long length.
- Ultimatley all properties of
- the old value of
This issue can be potentially avoided if self.cells.myFn
, or any other closure really, couldn't destroy a captured value if it is within the same object graph - if their respective responsible destroyers are equal.
However, responsible destroyers are always records/classes. The problem is, when a responsible destroyer - the record - destroys it's own children it shouldn't be stopped just because it and it's child share the same responsible destroyer.
It's only when the child destroys the parent - which is only possible when records destroy captured records/classes - that validating the responsible destroyer is actually useful for preventing this kind of situation.
In essence, the proposed fix is:
- Each anonymous procedure maintains it's own independent responsible destroyer tag
- The anonymous destroyer would pass it's own responsible destroyer as a child agent
- Other types, such as records/classes and arrays would simply pass
NULL
as a child agent
- Every time a record/class (or any other type) is freed, a child agent, is required.
- If the record/class's responsible destroyer is equal to the child agent, it returns and doesn't free
- Otherwise proceed with freeing.
In addition, these proposed changes would eliminate the need to maintain locks for procedures, and records/classes.
The issue, at least in mem-test.nhp
, seems to be resolved. However, this is not the first time a memory-safety fix was ultimately determined to be unsatisfactory; after almost every fix, other memory safety bugs were found.
Because memory-safety is such a core tenet of North-Hollywood Python, further testing should be conducted to ensure the fix actually works.
Just tested the fix, with
linked-list.nhp
csv-test.nhp
list-test.nhp
And everything seems to work.