standardese/cppast

Invalid handling of files containing non-ASCII characters in their path

Marandil opened this issue · 0 comments

  • cppast version: latest (b155d6a)
  • parser: libclang_parser
  • clang version: 11.0.0-++20200311091410+4016c6b07f2-1~exp1~20200311082026.1554 x86_64-pc-linux-gnu

It seems that simple_file_parser/libclang_parser incorrectly handles files that contain non-ASCII characters in their path. In my initial case this was a Polish letter (ł) in home directory path, but I was able to reproduce this in both filename and folder name with other non-standard ASCII letters.

Note that clang itself does not have a problem loading these files and neither do C++ fstream.

Input:

test-a.c

const int foo = 7;

test-ą.c

const int bar = 8;

α/test.c

const int baz = 9;

Input flags:
Default (reproduced with default CMake build for GCC 7.5, GCC 8.3 and Clang 11.0) with -DCPPAST_BUILD_TOOL=On

Output:
Using the cppast tool:

$ cppast/tool/cppast ../test-a.c 
AST for '../test-a.c':
+-foo (variable) [definition]: `int const foo=7;`

$ cppast/tool/cppast ../test-ą.c 
AST for '../test-ą.c':

$ cppast/tool/cppast ../α/test.c
AST for '../α/test.c':