nlp-uoregon/trankit

em dash character crashes French pipeline

pa-nlp opened this issue · 0 comments

I tested trankit with the base and large models using the French pipeline and the em dash (character unicode 8212) causes the model to crash. The online demo seems to have the same problem. A quick replace on the input string to change to an hyphen avoid this issue. I did not test the three other types of dashes, nor with other languages.