aliariff/vscode-erb-beautify

Failed when formating UTF8 characters

brunoprietog opened this issue · 5 comments

Hi,

When I try to format a file that has UTF8 characters as á I get this error.

failed with exit code: 1. '/home/bruno/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/htmlbeautifier-1.4.2/bin/htmlbeautifier:12:in `rescue in beautify': Error parsing standard input: invalid byte sequence in US-ASCII on line 1 (RuntimeError) from /home/bruno/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/htmlbeautifier-1.4.2/bin/htmlbeautifier:9:in `beautify' from /home/bruno/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/htmlbeautifier-1.4.2/bin/htmlbeautifier:111:in `<top (required)>' from /home/bruno/.rbenv/versions/3.1.2/bin/htmlbeautifier:25:in `load' from /home/bruno/.rbenv/versions/3.1.2/bin/htmlbeautifier:25:in `<main>' /home/bruno/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/htmlbeautifier-1.4.2/lib/htmlbeautifier/parser.rb:37:in `rescue in dispatch': invalid byte sequence in US-ASCII on line 1 (RuntimeError) from /home/bruno/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/htmlbeautifier-1.4.2/lib/htmlbeautifier/parser.rb:31:in `dispatch' from /home/bruno/.rbenv/versio...

This is because LC_ALL=en_US.UTF-8 is set in the htmlbeautifier command.

If I run htmlbeautifier with htmlbeautifier file.html.erb file.html.erb it works fine, but if I run it with LC_ALL=en_US.UTF-8 htmlbeautifier file.html.erb file.html.erb it fails with the same error and an additional warning.

bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

Thanks

This can be temporarily fixed by running sudo locale-gen en_US.UTF-8. Anyway, it should not be necessary and it is not recommended to change the locales manually via environment variables.

Hi @brunoprietog

Thanks for creating this issue, but seems for other users this addition is necessary. See #8 and #9

I think if we want to solve it properly, maybe its better if we introduce a configuration variable that people can set and we just append this configuration when perform the formatting.

@brunoprietog could you please also provide the file that has this issue so that I can reproduce it locally.

Any file that has a character such as á, for example.

<p>
Cómo estás?
</p>

You will only be able to reproduce the problem if you don't have the en_US.utf8 locale, which was my case.

bruno@DellBruno:~$ locale -a
C
C.utf8
POSIX

So, when the extension was trying to change to en_US.utf8 manually with the environment variable it was not possible.

When executing sudo locale-gen en_US.UTF-8, now the output of locale -a is

bruno@DellBruno:~$ locale -a
C
C.utf8
POSIX
en_US.utf8

And everything works fine.

Maybe one could check if the en_US.utf8 locale is available before using it?

Hi @brunoprietog, we release a new version that removes this language setting env var, and instead, I create a new config property for adding custom env var (if necessary).

See: https://github.com/aliariff/vscode-erb-beautify/releases/tag/v0.4.0