CTU-IIG/thermobench

Non ASCII symbols in sensors files.

Closed this issue · 9 comments

Symbols like degree (°) might cause harm if the thermobench output files are analyzed by some tools that do not support them by default. I suggest removing them, and leave only ASCII for the columns descriptions.

Which tools are these? "°" is a normal UTF8 character and UTF8 is largely ASCII compatible. At worst, you should get that length of "°" is 2 instead of 1, but everything else should work. Can you elaborate which kind of harm it causes?

I feel that I encountered some problems when importing csv in Pandas (Python). Also, in the default settings, my text editor (VSCode) is showing these ugly "?" instead of degrees.

If Pandas run on Python 3, it should work or it's a bug in Pandas. Or maybe your code does not read CSV as UTF-8 and it's a bug in your code.

Regarding VSCode, do you have files.autoGuessEncoding set to true? See this.

I do not know. Indeed, the Pandas and VSCode should handle utf-8, but still, I am getting errors like

'utf-8' codec can't decode byte 0xb0 in position 337: invalid start byte

In VSCode, you see the question marks
image
even though the encoding seems to be set to UTF8
image

Nevertheless, when I work with the data, I do not even know, how to write the degree symbol using only my keyboard, so I am forced to copy/paste it every time.

Is there some real benefit in using the non-ascii symbols?

Strange. Both commands bellow work for me:

python3 -c 'print(repr(open("sensors.imx8", "r").read()))'
python3 -c 'print(repr(open("sensors.imx8", "rb").read().decode("utf-8", "strict")))'

Can you try them on your system?

You can enter ° in may ways. If you have compose key enabled
(typically mapped to AltGr or Shift+AltGr) then ° is
Compose+o,Compose+o. Otherwise, you can press Control+Shift+U,Control+Shift+b,Control+Shift+0.

The benefit of UTF-8 is that we no longer live in 1970s and all
software should support all possible characters :-). If you run
thermobench --stdout with a benchmark that prints non-ASCII UTF-8,
you would probably have the same problem.

By looking at the .csv file you sent yesterday, I saw, that this file is not encoded in UTF-8 and that's why you have the problem with ° characters. So this is not a problem with sensors file, but with how thermobench encodes UTF-8. As I don't have this problem on my computer, I suspect that this has something to the with locale setting on the target board. I'll investigate this more and try to fix it properly. I guess that the output of locale command on the board will report some problems.

Yes, that might be the issue.

The output of locale command on imx8 is the following (at least for me)

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=cs_CZ.UTF-8
LC_TIME=cs_CZ.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=cs_CZ.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=cs_CZ.UTF-8
LC_NAME=cs_CZ.UTF-8
LC_ADDRESS=cs_CZ.UTF-8
LC_TELEPHONE=cs_CZ.UTF-8
LC_MEASUREMENT=cs_CZ.UTF-8
LC_IDENTIFICATION=cs_CZ.UTF-8
LC_ALL=

As we discussed elsewhere, the problem is already in your sensors.imx8. I've checked your installation and this file not UTF-8 encoded. I guess that git status in your repository should tell you that this file is modified. If this is the case, just run git checkout sensors.imx8 to get the correct UTF-8 version. Otherwise this might be some problem with git.

It seems that git checkout worked. The sensors file seems to be corrupted.