Provide a way to pass a character stream to the tokenizer or a way to preprocess the supplied text
fdutton opened this issue · 1 comments
fdutton commented
PHP does not handle Unicode very well so I found it necessary to preprocess the supplied text by folding the Unicode characters into the ASCII character set using iconv('UTF-8', 'ASCII//TRANSLIT', $string)
.
Instead of passing a string to the parser, I would prefer passing a character stream or specifying a translation function so that I do not have to modify the generated code.
DmitrySoshnikov commented
Yeah, instead of a single generic parse
function on a parser in the template, feel free to add parseFromString
(and default parse
will just call it), and the parseFromCharStream
to PHP plugin.