lelit/pglast

deparse_protobuf() broken on big-endian

df7cb opened this issue · 9 comments

df7cb commented

Hi,
as can be seen on https://buildd.debian.org/status/logs.php?pkg=pglast&ver=5.0%7Edev0-1, pglast is broken on big-endian architectures. (Sorry for the version number messup, I had uploaded a git snapshot of the v5 branch as 5.0~dev0 a week or two before you released this version. Thanks for the release and the fixes in there!)

The specific failure is this:

=================================== FAILURES ===================================
____________________________ test_deparse_protobuf _____________________________

    def test_deparse_protobuf():
>       assert deparse_protobuf(parse_sql_protobuf('select 1')) == 'SELECT 1'
E       AssertionError: assert '' == 'SELECT 1'
E         - SELECT 1

tests/test_parser.py:184: AssertionError

The above log is from the "Debian" dev0 package, but I can reproduce the problem on the "real" dev0 version as well:

>>> from pglast.parser import deparse_protobuf, parse_sql_protobuf
>>> deparse_protobuf(parse_sql_protobuf('select 1'))
''
>>> parse_sql_protobuf('select 1')
b'\x08\xf1\x93\t\x12\x1c\n\x1a\xd2\x03\x17\x1a\x10\x9a\x0b\r\x1a\t\xf2\x0e\x06\n\x02\x08\x01X\x07 \x07p\x01\x88\x01\x01'
>>> deparse_protobuf(parse_sql_protobuf('select'))
''
df7cb commented

The problem is in deparse_protobuf() since I can successfully deparse that string on x86:

>>> from pglast.parser import deparse_protobuf, parse_sql_protobuf
>>> deparse_protobuf(b'\x08\xf1\x93\t\x12\x1c\n\x1a\xd2\x03\x17\x1a\x10\x9a\x0b\r\x1a\t\xf2\x0e\x06\n\x02\x08\x01X\x07 \x07p\x01\x88\x01\x01')
'SELECT 1'
lelit commented

Do you have a chance to try the same in a C program linked against libpg_query?

df7cb commented
df7cb commented

... next year:

This program prints "SELECT 1" both on x86_64 and s390x:

#include <pg_query.h>
#include <stdio.h>

int
main ()
{
	char *query = "SELECT 1";
	PgQueryProtobufParseResult parse_result = pg_query_parse_protobuf(query);
	PgQueryDeparseResult deparse_result = pg_query_deparse_protobuf(parse_result.parse_tree);
	printf("deparsed query: %s\n", deparse_result.query);
}
$ gcc -Wall test.c -lpg_query
$ ./a.out 
deparsed query: SELECT 1

Would it help to extract the protobuf string? (I tried printing it, but it's binary, will dig deeper if it helps.)

lelit commented

Thanks, so it seems a problem on the Python side.
Yes, the protobuf string would probably help in spotting the problem.
Is there any way to "simulate" a big endian machine, to try replicating the problem here?

df7cb commented

Rethinking about it, the protobuf string should be the same as the one python produced, quoted above.

For big-endian you can try qemu which offers a few architectures, on Debian: qemu-system-ppc, qemu-system-sparc, qemu-system-misc. I'd suggest to try s390x (in -misc) since it's among the official Debian ports (ppc/be and sparc are on limited support only.)

Alternatively, if you have a patch to try, I can run it on the Debian porter boxes.

df7cb commented

After poking around in the code I found the problem.

Since protobuf handles endianess itself, the issue was likely in the glue code. I suspected this line to be the issue:
https://github.com/lelit/pglast/blob/v4/pglast/parser.pyx#L437

It turned out this line is correct, but the problem is very close, and it's in libpg_query:
https://github.com/pganalyze/libpg_query/blob/15-latest/pg_query.h#L16

The len field there is an int, which is 32-bit even on 64-bit architectures, while Py_ssize_t is of course 64-bit. On little endian, that's (almost) not a problem, but on big endian, the wrong half of the size written is read, which is all zeros.

I'll reassign this issue to libpg_query once the official Debian builds have confirmed the tests pass.

lelit commented

Thanks a lot for your exploration and findings! ❤️

This would have been very difficult to impossible for me to discover, going thru qemu!

lelit commented

FYI, current v5 branch uses libpg_query 15-4.1.0. Hopefully I will release dev1 (or 5.0 final, even) soon.