utapyngo/WritingSmellDetector

Non ascii characters

Closed this issue · 2 comments

First of all, great job, thanks for sharing!

You might be interested to know that the script crashes if the .tex file contains non ascii characters (e.g. é in the author name):

INFO: Loaded 37040 bytes of text from sample.tex
Traceback (most recent call last):
  File "wsd.py", line 438, in <module>
    sys.exit(analyze(args))
  File "wsd.py", line 401, in analyze
    open(args.outfile, 'wb').write(p.to_html(not args.no_embed_css))
  File "wsd.py", line 86, in to_html
    css=loader.get_source(env, 'style.css')[0] if embed_css else None)
  File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 891, in render
    return self.environment.handle_exception(exc_info, True)
  File "/home/qbonnard/Téléchargements/WritingSmellDetector/html/template.html", line 180, in top-level template code
    {{ chunk.data|e }}
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 30: ordinal not in range(128)

Thanks! We're on it.

Fixed in 4291259.
Some rules still need to be rewritten to use unicode character classes (like \p{Lu}). Unfortunately Python re module does not support them.