Bug with the Serial monitor and bogus UTF 8 display requests?
Closed this issue · 2 comments
Here is a code to run on a UNO
https://wokwi.com/projects/423687284381725697
const char * utf8Message = "éè§à€£";
String result = "";
void setup() {
Serial.begin(115200);
size_t n = strlen(utf8Message);
Serial.print("\nCopying "); Serial.print(n);
Serial.println(" bytes into the String.");
for (size_t i = 0; i < n; i++) {
result.concat(utf8Message[i]);
//Serial.print("Copy at this stage ["); Serial.print(result); Serial.println("]"); // <==== uncomment
}
Serial.print("Original: ["); Serial.print(utf8Message); Serial.println("]");
Serial.print("Copy: ["); Serial.print(result); Serial.println("]");
}
void loop() {}
the output of this code is the expected
Copying 13 bytes into the String.
Original: [éè§à€£]
Copy: [éè§à€£]
Now uncomment the debug line in the for statement where the concat happens.
and the output becomes a mess, including the original const cString.
Copying 13 bytes into the String.
Copy at this stage [Ã]
Copy at this stage [é]
Copy at this stage [éÃ]
Copy at this stage [éè]
Copy at this stage [éèÂ]
Copy at this stage [éè§]
Copy at this stage [éè§Ã]
Copy at this stage [éè§à ]
Copy at this stage [éè§à â]
Copy at this stage [éè§à â�]
Copy at this stage [éè§à �]
Copy at this stage [éè§à â�¬Â]
Copy at this stage [éè§à �£]
Original: [éè§à �£]
Copy: [éè§à �£]
testing this on a real uno does the right thing
Copying 13 bytes into the String.
Copy at this stage [⸮]
Copy at this stage [é]
Copy at this stage [é⸮]
Copy at this stage [éè]
Copy at this stage [éè⸮]
Copy at this stage [éè§]
Copy at this stage [éè§⸮]
Copy at this stage [éè§à]
Copy at this stage [éè§à⸮]
Copy at this stage [éè§à⸮]
Copy at this stage [éè§à€]
Copy at this stage [éè§à€⸮]
Copy at this stage [éè§à€£]
Original: [éè§à€£]
Copy: [éè§à€£]
some of the intermediary output is bogus since the UTF8 characters I've used fit on multiple bytes, so can't be interpreted, but in the end all works out OK and you get the String to match the original cString.
This is the first time I catch the simulator not doing the same thing on basic code as the real hardware.
Anyone has an explanation on why that is ? May be a bug in their UTF8 character display in the console that does not recover ?
EDIT: some discussion here in the Arduino Forum
https://forum.arduino.cc/t/wokwi-string-bug-with-utf8-data-interesting/1357096/6
We seem to agree it's a limitation of the wokwi terminal emulator which cannot recover from a bad UTF8 byte stream.
Issue reproduced - seems to be a limitation of the serial monitor. Note that if you use the serial terminal instead, it works correctly - paste the following code into your diagram.json:
{
"version": 1,
"author": "Anonymous maker",
"editor": "wokwi",
"parts": [ { "id": "uno", "type": "wokwi-arduino-uno" } ],
"connections": [],
"serialMonitor": { "display": "terminal" }
}and then you should get
Pushed a fix - the serial monitor mode should now correctly handle this case.
