Zac-HD/shed

Complicated regex somehow breaks shed --refactor

DRMacIver opened this issue · 1 comments

Haven't looked into what's going on beyond to minimize the bug, but the following code:

import re

def tokenize_csv(csv_string: str, delimiter: str = ",", quote_char: str = '"') -> list:
    token_strings = re.split(fr"\{delimiter}(?=(?:(?:\{quote_char})*[^\\{quote_char}]))?", csv_string)
    tokens = [re.sub(fr"{quote_char}(?=(?:(?:\{quote_char})*[^\\{quote_char}]))|\\", "", token) for token in token_strings]
    return tokens

Results in the following error:

Internal error formatting 'shedbug.py': ParserSyntaxError: Syntax Error @ 1:1.
tokenizer error: unterminated string literal

import re
^
    Please report this to https://github.com/Zac-HD/shed/issues

Reproduced with libcst andrf"\\", so moved upstream to Instagram/LibCST#917.

For local mitigation let's handle this as suggested in #93; we can have shed explicitly check if it's a libcst bug and report that if so.