rhashimoto/wa-sqlite

column_blob returns incorrect values

Closed this issue · 8 comments

Any time a blob column is returned, the wrong values are returned for that column.

The pointer to the sqlite3_colum_blob is the same for every row in the result set even when the blobs are different.

I've created a small repository that reproduces the exact issue here:

https://github.com/tantaman/sandbox/tree/main


The interesting bits are in src/main.ts:

await sql`DROP TABLE IF EXISTS foo`;
await sql`CREATE TABLE IF NOT EXISTS foo (a)`

console.log('INSERTING 010203');
await sql`INSERT INTO foo VALUES (X'010203')`

console.log('SELECTING');
let result = await sql`SELECT * FROM foo`;
printBytesResult(result);

console.log('INSERTING 01020304');
await sql`INSERT INTO foo VALUES (X'01020304')`

console.log('SELECTING');
result = await sql`SELECT * FROM foo`;
printBytesResult(result);

console.log('INSERTING 01020304');
await sql`INSERT INTO foo VALUES (X'01020304')`

console.log('SELECTING');
result = await sql`SELECT * FROM foo`;
printBytesResult(result);

cosonle.log('QUOTE SELECTING');
result = await sql`SELECT quote(a) FROM foo`;
printHexResult(result);

Instead of getting a byte array of values 010203 or 01020304 we get:

INSERTING 010203
main.ts:73 SELECTING
main.ts:97 raw bytes Int8Array(3) [-16, -81, 81, buffer: ArrayBuffer(16777216), byteLength: 3, byteOffset: 5353200, length: 3, Symbol(Symbol.toStringTag): 'Int8Array']
main.ts:98 hex f0af51
main.ts:77 INSERTING 01020304
main.ts:80 SELECTING
main.ts:97 raw bytes Int8Array(3) [-16, -81, 81, buffer: ArrayBuffer(16777216), byteLength: 3, byteOffset: 5350640, length: 3, Symbol(Symbol.toStringTag): 'Int8Array']
main.ts:98 hex f0af51
main.ts:97 raw bytes Int8Array(4) [-16, -81, 81, 0, buffer: ArrayBuffer(16777216), byteLength: 4, byteOffset: 5350640, length: 4, Symbol(Symbol.toStringTag): 'Int8Array']
main.ts:98 hex f0af5100
main.ts:84 INSERTING 01020304
main.ts:87 SELECTING
main.ts:97 raw bytes Int8Array(3) [-16, -81, 81, buffer: ArrayBuffer(16777216), byteLength: 3, byteOffset: 5352176, length: 3, Symbol(Symbol.toStringTag): 'Int8Array']
main.ts:98 hex f0af51
main.ts:97 raw bytes Int8Array(4) [-16, -81, 81, 0, buffer: ArrayBuffer(16777216), byteLength: 4, byteOffset: 5352176, length: 4, Symbol(Symbol.toStringTag): 'Int8Array']
main.ts:98 hex f0af5100
main.ts:97 raw bytes Int8Array(4) [-16, -81, 81, 0, buffer: ArrayBuffer(16777216), byteLength: 4, byteOffset: 5352176, length: 4, Symbol(Symbol.toStringTag): 'Int8Array']
main.ts:98 hex f0af5100
main.ts:91 QUOTE SELECTING
main.ts:104 raw bytes (3) [1, 2, 3]
main.ts:105 hex X'010203'
main.ts:104 raw bytes (4) [1, 2, 3, 4]
main.ts:105 hex X'01020304'
main.ts:104 raw bytes (4) [1, 2, 3, 4]
main.ts:105 hex X'01020304'

Notice the byteOffest parameter for each row

E.g., byteOffset: 5352176

Each blob has the same byte offset which is certainly incorrect.

Doing a sqlSELECT quote(a) FROM foo; then casting a from hex to bytes returns the correct result (see console.log('QUOTE SELECTING') section). So for some strange reason sqlite3_column_text is not impacted by this bug.

I tested deleting and updating blobs via where conditions. The value in the underlying storage is correct based on those tests (and the quote(a) selection test above).

one other interesting datapoint -- function extensions that return blob return correctly. Tables that store blobs do not.

Looks like the problem is that column_blob is returning a subarray which is just a view onto WASM memory, which could change on the next SQLite call. I think byteOffset is likely correct; it's just reusing that piece of memory.

A serious bug, for sure. Thanks for your work detailing it!

which could change on the next SQLite call

That makes sense. It also aligns with what I was seeing with the function extension working correctly.

The particular function extension I was calling returns a pointer to an unchanging slot in memory (it is allocated at extension load and not touched until extension unload). Hence why it would return the correct result.

Yep -- copying the returned blob in column_blob fixes it

const dst = new ArrayBuffer(result.byteLength);
const ret = new Int8Array(dst);
ret.set(result);

Technically the bug is not in column_blob(). The documentation says:

The contents of the returned buffer may be invalid after the next SQLite call. Make a copy of the data (e.g. with .slice()) if longer retention is required.

There is a bug in the row() convenience function (also used internally to implement exec()), which calls column_blob() via column() and doesn't copy blobs.

Documentation notwithstanding, maybe this unexpected behavior needs to be changed. Yes, it does follow how the C call works but the fact that I, the author of both code and documentation, couldn't use it correctly is telling. I have fixed row() but I'm still considering whether to have column_blob() and value_blob() return copies.

Given blobs weren't working correctly, #65 seems like much less of a breaking change.

I have fixed row() but I'm still considering whether to have column_blob() and value_blob() return copies.

I'm not going to do this. If you have large blobs then you might not want this copy to be made. It's easy to monkey patch column_blob() to make the copy but you can't monkey patch to disable a copy. I've modified row() to copy only if the blob references WASM memory so that if column_blob() is monkey patched to copy then row() won't copy again.