deparse_protobuf() broken on big-endian
df7cb opened this issue · 9 comments
Hi,
as can be seen on https://buildd.debian.org/status/logs.php?pkg=pglast&ver=5.0%7Edev0-1, pglast is broken on big-endian architectures. (Sorry for the version number messup, I had uploaded a git snapshot of the v5 branch as 5.0~dev0 a week or two before you released this version. Thanks for the release and the fixes in there!)
The specific failure is this:
=================================== FAILURES ===================================
____________________________ test_deparse_protobuf _____________________________
def test_deparse_protobuf():
> assert deparse_protobuf(parse_sql_protobuf('select 1')) == 'SELECT 1'
E AssertionError: assert '' == 'SELECT 1'
E - SELECT 1
tests/test_parser.py:184: AssertionError
The above log is from the "Debian" dev0 package, but I can reproduce the problem on the "real" dev0 version as well:
>>> from pglast.parser import deparse_protobuf, parse_sql_protobuf
>>> deparse_protobuf(parse_sql_protobuf('select 1'))
''
>>> parse_sql_protobuf('select 1')
b'\x08\xf1\x93\t\x12\x1c\n\x1a\xd2\x03\x17\x1a\x10\x9a\x0b\r\x1a\t\xf2\x0e\x06\n\x02\x08\x01X\x07 \x07p\x01\x88\x01\x01'
>>> deparse_protobuf(parse_sql_protobuf('select'))
''
The problem is in deparse_protobuf() since I can successfully deparse that string on x86:
>>> from pglast.parser import deparse_protobuf, parse_sql_protobuf
>>> deparse_protobuf(b'\x08\xf1\x93\t\x12\x1c\n\x1a\xd2\x03\x17\x1a\x10\x9a\x0b\r\x1a\t\xf2\x0e\x06\n\x02\x08\x01X\x07 \x07p\x01\x88\x01\x01')
'SELECT 1'
Do you have a chance to try the same in a C
program linked against libpg_query
?
... next year:
This program prints "SELECT 1" both on x86_64 and s390x:
#include <pg_query.h>
#include <stdio.h>
int
main ()
{
char *query = "SELECT 1";
PgQueryProtobufParseResult parse_result = pg_query_parse_protobuf(query);
PgQueryDeparseResult deparse_result = pg_query_deparse_protobuf(parse_result.parse_tree);
printf("deparsed query: %s\n", deparse_result.query);
}
$ gcc -Wall test.c -lpg_query
$ ./a.out
deparsed query: SELECT 1
Would it help to extract the protobuf string? (I tried printing it, but it's binary, will dig deeper if it helps.)
Thanks, so it seems a problem on the Python side.
Yes, the protobuf string would probably help in spotting the problem.
Is there any way to "simulate" a big endian machine, to try replicating the problem here?
Rethinking about it, the protobuf string should be the same as the one python produced, quoted above.
For big-endian you can try qemu which offers a few architectures, on Debian: qemu-system-ppc, qemu-system-sparc, qemu-system-misc. I'd suggest to try s390x (in -misc) since it's among the official Debian ports (ppc/be and sparc are on limited support only.)
Alternatively, if you have a patch to try, I can run it on the Debian porter boxes.
After poking around in the code I found the problem.
Since protobuf handles endianess itself, the issue was likely in the glue code. I suspected this line to be the issue:
https://github.com/lelit/pglast/blob/v4/pglast/parser.pyx#L437
It turned out this line is correct, but the problem is very close, and it's in libpg_query:
https://github.com/pganalyze/libpg_query/blob/15-latest/pg_query.h#L16
The len field there is an int, which is 32-bit even on 64-bit architectures, while Py_ssize_t is of course 64-bit. On little endian, that's (almost) not a problem, but on big endian, the wrong half of the size written is read, which is all zeros.
I'll reassign this issue to libpg_query once the official Debian builds have confirmed the tests pass.
Thanks a lot for your exploration and findings! ❤️
This would have been very difficult to impossible for me to discover, going thru qemu!
FYI, current v5 branch uses libpg_query 15-4.1.0. Hopefully I will release dev1 (or 5.0 final, even) soon.