Memory leak when passing strings within table value parameter to stored proc
mavxg opened this issue · 9 comments
Environment
- Python: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
- pyodbc: 5.1.0
- OS: Windows 10
- DB: Windows Server 2019 Standard (10.0)
- driver: ODBC Driver 17 for SQL Server
Issue
Memory leak when string values supplied to table valued parameter of stored proc. Same data supplied as float does not leak memory.
Example code
Following code should reproduce the error.
import psutil
import gc
import pyodbc
process = psutil.Process()
# Prepare database with do nothing stored proc
"""
CREATE TYPE Test_Table_Type AS TABLE
(
RowId bigint,
MeasureId int,
NodeId int,
Value float -- doesn't make any difference if this is varchar(256)
)
GO
CREATE PROCEDURE Test_Proc
@Facts Test_Table_Type READONLY
AS
BEGIN
SELECT COUNT(*) FROM @Facts
END
GO
"""
def print_memory_usage():
memory_usage = process.memory_info().rss # should this be rss or vms
print(f"{memory_usage / (1024 * 1024.0):0.4f} MiB")
def run_test(cnxn, call_proc, fn):
stmt = "{CALL Test_Proc (?)}"
cursor = cnxn.cursor()
x = 3.00000001
y = 2
z = 4
for i in range(5):
x += 1.0
y += 1
z += 1
# Make data with different strings for each row
data = [(i, z * i, y * i, fn(x * i)) for i in range(1_000_000)]
if call_proc:
cursor.execute(stmt, (data,))
del data
gc.collect()
print_memory_usage()
if __name__ == "__main__":
url = 'DRIVER={ODBC Driver 17 for SQL Server};SERVER=MY_SERVER;DATABASE=MY_DATABASE;Trusted_Connection=yes;'
connection = pyodbc.connect(url)
print_memory_usage()
print("\n# Make data but don't execute stored proc (with floats)")
run_test(connection, call_proc=False, fn=float)
print("\n# Make data but don't execute stored proc (with strings)")
run_test(connection, call_proc=False, fn=str)
print("\n# Make data and execute stored proc (with floats)")
run_test(connection, call_proc=True, fn=float)
print("\n# Make data and execute stored proc (with strings)")
run_test(connection, call_proc=True, fn=str)
Output of sample code
44.1406 MiB
# Make data but don't execute stored proc (with floats)
45.9414 MiB
47.8555 MiB
47.8633 MiB
47.8672 MiB
47.8672 MiB
# Make data but don't execute stored proc (with strings)
47.8750 MiB
47.8789 MiB
47.8789 MiB
46.1562 MiB
47.8242 MiB
# Make data and execute stored proc (with floats)
46.4648 MiB
48.1289 MiB
46.4727 MiB
48.1406 MiB
46.4766 MiB
# Make data and execute stored proc (with strings)
121.0195 MiB
194.1836 MiB
267.8984 MiB
344.1250 MiB
419.3477 MiB
Process finished with exit code 0
Could you post an ODBC trace?
Reproduced on Ubuntu 20.04. Both .rss and .vms increase. Same behaviour if cursor is created, consumed (.fetchall()), and closed for each iteration.
ODBC trace logs for "execute with float" and "execute with str", using 5_000 rows. (Trace logs for 1_000_000 rows were almost 500 MiB each unzipped.)
Note also that this is not a version 5.x regression; version 4.0.39 produces the same results.
No need to run it with that many rows since the pattern shows up with far less.
I think the code is missing a Py_XDECREF(param) after this line: https://github.com/mkleehammer/pyodbc/blob/master/src/params.cpp#L1268
I added this to my local build but the leak persists.
@@ -1264,10 +1264,11 @@ bool BindParameter(Cursor* cur, Py_ssize_t index, ParamInfo& info)
for(i=0;i<ncols;i++)
{
// Bind the TVP's columns --- all need to use DAE
PyObject *param = PySequence_GetItem(row, i);
+ Py_XDECREF(param);
GetParameterInfo(cur, i, param, info.nested[i], true);
info.nested[i].BufferLength = info.nested[i].StrLen_or_Ind;
info.nested[i].StrLen_or_Ind = SQL_DATA_AT_EXEC;
Py_BEGIN_ALLOW_THREADS
Some more testing that might help.
The leak still occurs if you take the creation of the data
object out of the loop. This seems to imply it is leaking memory allocated within pyODBC rather than a reference count issue with the passed in data. My guess was from the following code in cursor.cpp where it seems like the GetParameterInfo
there is being called for each data item and doing an encode on the string generating an encoded byte array and the cleanup for this is somehow missing (the leak doesn't occur if you provide a bytes
type parameters by doing the encode on the python side but does for str
.)
I have found a related memory leak (that I think may have been spotted before) when you insert a long string with cursor.fast_executemany=True
you get a leak if the column you are inserting into is varchar(MAX)
.
Both of these memory leaks are because of the same problem but in two difference places where some coded is duplicated.
The TVP issue is caused because in GetUnicodeInfo
you get a new PyObject
from PyCodec_Encode
with a ref count of 1. While this object is wrapped by encoded
the Detach
call means it will not decrement the ref count in the destructor. For normal calls this is correct behaviour as the decrement happens in FreeInfos
with the line Py_XDECREF(a[i].pObject);
. For TVP parameters FreeInfos
will only clean up the top level parameter and not all the items in the sequence and so this object is getting leaked.
This is the same issue with cursor.fast_executemany=True
when you have a column with MAX size because the detach on this line causes the reference count to not be reduced by the destructor
Line 314 in ff1dd23