test_parse_id_list failure on ARM64

Question

test_parse_id_list failure on ARM64

mmd-osm opened this issue a year ago · 10 comments

For some reason, I'm seeing unit test failures for test_parse_id_list on my Raspberry running bookworm in 64bit mode. I'm not exactly sure what's going on, just posting it here so we don't forget about it.

test_parse_id_list is a Catch v2.13.10 host application.
Run with -? for options

-------------------------------------------------------------------------------
Invalid string
-------------------------------------------------------------------------------
../test/test_parse_id_list.cpp:35
...............................................................................

../test/test_parse_id_list.cpp:36: FAILED:
  CHECK( api06::valid_string("foobar\x80") == false )
with expansion:
  true == false

../test/test_parse_id_list.cpp:37: FAILED:
  CHECK( api06::valid_string("foobar\xff") == false )
with expansion:
  true == false

-------------------------------------------------------------------------------
Id list with garbage
-------------------------------------------------------------------------------
../test/test_parse_id_list.cpp:66
...............................................................................

../test/test_parse_id_list.cpp:70: FAILED:
  CHECK( result.empty() == true )
with expansion:
  false == true

It's the only test failure by the way, everything else works ok:

============================================================================
Testsuite summary for CGImap 0.8.10
============================================================================
# TOTAL: 33
# PASS:  32
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

Answer 1 · 2023-12-12T18:41:39.000Z

x86_64: https://cpp.godbolt.org/z/chx53h6zd -> ok
ARM64: https://cpp.godbolt.org/z/zEWKxnGPE -> fails

Answer 2 · 2023-12-12T19:39:51.000Z

Looks like the compiler optimizes the check c >= 0 && c <= UCHAR_MAX to constant true because char == unsigned char on arm. If you replace the char parameter in the lambda with signed char it works.

https://cpp.godbolt.org/z/fb9nfd59P

Answer 3 · 2023-12-12T19:41:28.000Z

Don't forget char may default to signed on some platforms and unsigned on others - it is implementation defined.

Answer 4 · 2023-12-12T19:42:17.000Z

Yup, usually signed for x86, unsigned for arm.

Answer 5 · 2023-12-13T23:06:53.000Z

Maybe it would be easier to whitelist all permitted characters, i.e. as space, digits 0-9, comma, and v for version.

constexpr bool valid_string(std::string_view str)
{
  return str.find_first_not_of(" 0123456789,v") == std::string::npos;
}

Answer 6 · 2023-12-14T21:00:16.000Z

That is going to be far slower than the simple range check and is duplicated effort with the parsing that happens later. As far as I've seen the check is only in there because the parser can't handle non-ascii chars and otherwise wouldn't be required at all?

Answer 7 · 2023-12-14T21:10:46.000Z

Yes, somehow the boost based parser is doing some BOOST_ASSERT(strict_ischar(ch)), which might terminate the program with an assertion failure depending on NDEBUG and user input. In the end I decided to add the additional check, and leave the rest as is.

For the sake of parsing a list of objects with an optional version number, using boost might be a bit overkill. It was already in place when I started here.

Answer 8 · 2023-12-14T22:21:21.000Z

We could simply check if only the 7 least significant bits are used (i.e. isascii). That would be just as fast and should work everywhere.

bool isascii(char ch)
{
    return 0 == (ch & ~0x7f);
}

Answer 9 · 2023-12-15T11:01:25.000Z

Right, both versions generate the same assembly code (not eax; shr al,7) -> https://cpp.godbolt.org/z/MrjK773fW
For some reason, I find the version without two's complement better.

Answer 10 · 2023-12-16T11:20:01.000Z

Tests on ARM64 are passing again:


============================================================================
Testsuite summary for CGImap 0.8.10
============================================================================
# TOTAL: 33
# PASS:  33
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================