make_tokenizer doesn't deal with binary tokenizers
gsnedders opened this issue · 0 comments
gsnedders commented
tokenize = make_tokenizer([
(u'x', (br'\xff\n',)),
])
tokens = list(tokenize(b"\xff\n"))
throws
File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/tests/test_parsing.py", line 76, in test_tokenize_bytes
tokens = list(tokenize(b"\xff\n"))
File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/lexer.py", line 107, in f
t = match_specs(compiled, str, i, (line, pos))
File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/lexer.py", line 91, in match_specs
nls = value.count(u'\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
match_specs
needs to handle unicode
and bytes
line feed characters.