zerebubuth/openstreetmap-cgimap

test_parse_id_list failure on ARM64

Closed this issue · 10 comments

For some reason, I'm seeing unit test failures for test_parse_id_list on my Raspberry running bookworm in 64bit mode. I'm not exactly sure what's going on, just posting it here so we don't forget about it.

test_parse_id_list is a Catch v2.13.10 host application.
Run with -? for options

-------------------------------------------------------------------------------
Invalid string
-------------------------------------------------------------------------------
../test/test_parse_id_list.cpp:35
...............................................................................

../test/test_parse_id_list.cpp:36: FAILED:
  CHECK( api06::valid_string("foobar\x80") == false )
with expansion:
  true == false

../test/test_parse_id_list.cpp:37: FAILED:
  CHECK( api06::valid_string("foobar\xff") == false )
with expansion:
  true == false

-------------------------------------------------------------------------------
Id list with garbage
-------------------------------------------------------------------------------
../test/test_parse_id_list.cpp:66
...............................................................................

../test/test_parse_id_list.cpp:70: FAILED:
  CHECK( result.empty() == true )
with expansion:
  false == true

It's the only test failure by the way, everything else works ok:

============================================================================
Testsuite summary for CGImap 0.8.10
============================================================================
# TOTAL: 33
# PASS:  32
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

Looks like the compiler optimizes the check c >= 0 && c <= UCHAR_MAX to constant true because char == unsigned char on arm. If you replace the char parameter in the lambda with signed char it works.

https://cpp.godbolt.org/z/fb9nfd59P

Don't forget char may default to signed on some platforms and unsigned on others - it is implementation defined.

Yup, usually signed for x86, unsigned for arm.

Maybe it would be easier to whitelist all permitted characters, i.e. as space, digits 0-9, comma, and v for version.

constexpr bool valid_string(std::string_view str)
{
  return str.find_first_not_of(" 0123456789,v") == std::string::npos;
}

That is going to be far slower than the simple range check and is duplicated effort with the parsing that happens later. As far as I've seen the check is only in there because the parser can't handle non-ascii chars and otherwise wouldn't be required at all?

Yes, somehow the boost based parser is doing some BOOST_ASSERT(strict_ischar(ch)), which might terminate the program with an assertion failure depending on NDEBUG and user input. In the end I decided to add the additional check, and leave the rest as is.

For the sake of parsing a list of objects with an optional version number, using boost might be a bit overkill. It was already in place when I started here.

We could simply check if only the 7 least significant bits are used (i.e. isascii). That would be just as fast and should work everywhere.

bool isascii(char ch)
{
    return 0 == (ch & ~0x7f);
}

Right, both versions generate the same assembly code (not eax; shr al,7) -> https://cpp.godbolt.org/z/MrjK773fW
For some reason, I find the version without two's complement better.

Tests on ARM64 are passing again:


============================================================================
Testsuite summary for CGImap 0.8.10
============================================================================
# TOTAL: 33
# PASS:  33
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================