ShivamSarodia/ShivyC

Simplify the tokenize_line function

Opened this issue · 2 comments

The tokenize_line lexer function is long and becoming difficult to maintain/read. It should be broken up into multiple parts, perhaps as part of a refactor of the lexer as a whole.

Was looking a little bit at this one. Maybe a separate issue but I thought about isolating the preprocessing a bit more so it can do the steps 1 trigraph to single character, 2 line splicing/joining etc before going into 3 the tokenization and then 4 macro expansion.
I.e. doing trigraph conversion does not really fit in this function as it requires looking at three "symbol_kinds" instead of two so refactoring it without thinking about trigraphs is perhaps a bit "short-term" win as such?

@eriols This makes sense! I like your idea.

There was some discussion about the preprocessor here. Seems a bit tangential but maybe worth looking at if you tackle this issue. Feel free to let me know if you have thoughts/questions.