postgrespro/rum

entry->buffer modified after retrieval leads to invalid access and segment fault in rumget.c:scanpage.

Closed this issue · 3 comments

conclusion:
The entry->buffer pointer is retrieved, but the underlying buffer can be modified by another thread before it's used, leading to invalid memory access and a segment fault.

env:
postgresql REL_12_STABLE branch and rum master branch.

reproduce:
test.sql:

set enable_seqscan to off;
set max_parallel_workers_per_gather = 0;
set force_parallel_mode = off;


insert into test_float4 values (1),(-1),(2);

explain analyze select * from test_float4 where i = 1::float4;
explain analyze select * from test_float4 where i = -1::float4;
explain analyze select * from test_float4 where i = 2::float4;
explain analyze select * from test_float4 where i = 1::float4;
explain analyze select * from test_float4 where i = -1::float4;
explain analyze select * from test_float4 where i = 2::float4;
explain analyze select * from test_float4 where i = 1::float4;
explain analyze select * from test_float4 where i = -1::float4;

test.py:

import threading
import psycopg2
import time

def execute_sql():
    while True:
        try:
            conn = psycopg2.connect(
                dbname="postgres",
                user="username",
                password="password",
                host="localhost",
                port="5432"
            )
            cur = conn.cursor()

            with open('test.sql', 'r') as file:
                sql = file.read()
                cur.execute(sql)
                conn.commit()

            cur.close()
            conn.close()
        except Exception as e:
            print(f"Error: {e}")
            break

threads = []
for i in range(16):
    thread = threading.Thread(target=execute_sql)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

just create a simple table and a rum index on it:

CREATE TABLE test_float4(i float4);
CREATE INDEX idx_t ON test_float4 USING rum(i);

run test.py a few minutes will got some coredump like this:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f04f33d5f1a in rumDataPageLeafRead (ptr=0x7f04ea534eaa "\b", attnum=1, item=0x7ffde964e578, copyAddInfo=true, rumstate=0x55a871e8bfb8)
    at src/rum.h:987
987			if (attr->attbyval)
(gdb) bt
#0  0x00007f04f33d5f1a in rumDataPageLeafRead (ptr=0x7f04ea534eaa "\b", attnum=1, item=0x7ffde964e578, copyAddInfo=true, rumstate=0x55a871e8bfb8)
    at src/rum.h:987
#1  0x00007f04f33d7b3e in scanPage (rumstate=0x55a871e8bfb8, entry=0x55a871e9d058, item=0x55a871e9d080, equalOk=false) at src/rumget.c:1673
#2  0x00007f04f33d73a2 in entryGetNextItem (rumstate=0x55a871e8bfb8, entry=0x55a871e9d058, snapshot=0x55a871e3ca30) at src/rumget.c:896
#3  0x00007f04f33d553a in entryGetItem (rumstate=0x55a871e8bfb8, entry=0x55a871e9d058, nextEntryList=0x0, snapshot=0x55a871e3ca30) at src/rumget.c:1310
#4  0x00007f04f33d86f4 in scanGetItemRegular (scan=0x55a871e82380, advancePast=0x7ffde964e7d0, item=0x7ffde964e7d0, recheck=0x7ffde964e7e7)
    at src/rumget.c:1480
#5  0x00007f04f33d3c29 in scanGetItem (scan=0x55a871e82380, advancePast=0x7ffde964e7d0, item=0x7ffde964e7d0, recheck=0x7ffde964e7e7) at src/rumget.c:2129
#6  0x00007f04f33d36f9 in rumgetbitmap (scan=0x55a871e82380, tbm=0x55a871e83590) at src/rumget.c:2167
#7  0x000055a870b88811 in index_getbitmap (scan=0x55a871e82380, bitmap=0x55a871e83590) at indexam.c:670
#8  0x000055a870d9296c in MultiExecBitmapIndexScan (node=0x55a871e82090) at nodeBitmapIndexscan.c:105
#9  0x000055a870d7baea in MultiExecProcNode (node=0x55a871e82090) at execProcnode.c:506
#10 0x000055a870d91860 in BitmapHeapNext (node=0x55a871e81da0) at nodeBitmapHeapscan.c:114
#11 0x000055a870d7dbe3 in ExecScanFetch (node=0x55a871e81da0, accessMtd=0x55a870d91780 <BitmapHeapNext>, recheckMtd=0x55a870d91e30 <BitmapHeapRecheck>)
    at execScan.c:133
#12 0x000055a870d7d832 in ExecScan (node=0x55a871e81da0, accessMtd=0x55a870d91780 <BitmapHeapNext>, recheckMtd=0x55a870d91e30 <BitmapHeapRecheck>)
    at execScan.c:183

add some logs in root datapage split and found that after the split ,stack->buffer should be untoached but entrygetnextitem will still lock the stack->buffer and perform a scanpage on it.
for more details, stack->buffer's page flags became 1 which is RUM_DATA and not RUM_LEAF, and scanpage will use rumDataPageLeafRead to read this page.
make a fix in PR#145.

Thank you for your issue, especially the playback scripts!

I saw your commit and will try to get back to you with feedback soon. I suggest discussing further edits in the pull request page.

Pull request #145 merged successfully
Thank you very much for your contribution