rust-embedded/rust-raspberrypi-OS-tutorials

04_safe_globals unicode support

mumblingdrunkard opened this issue · 2 comments

I went a bit off track while going through this after I noticed a c as u8 where c: char. This didn't sit right with me so I decided to give it proper unicode support:

I don't know if you want to use this exact code, but this worked great for me. Using .encode_utf8() gives a string slice that can be iterated over with .bytes(). Since I'm no longer writing chars, I decided to update it to bytes_written instead.

diff --git a/04_safe_globals/src/bsp/raspberrypi/console.rs b/04_safe_globals/src/bsp/raspberrypi/console.rs
index f340d94..38561ba 100644
--- a/04_safe_globals/src/bsp/raspberrypi/console.rs
+++ b/04_safe_globals/src/bsp/raspberrypi/console.rs
@@ -15,7 +15,7 @@ use core::fmt;
 ///
 /// The mutex protected part.
 struct QEMUOutputInner {
-    chars_written: usize,
+    bytes_written: usize,
 }
 
 //--------------------------------------------------------------------------------------------------
@@ -39,16 +39,20 @@ static QEMU_OUTPUT: QEMUOutput = QEMUOutput::new();
 
 impl QEMUOutputInner {
     const fn new() -> QEMUOutputInner {
-        QEMUOutputInner { chars_written: 0 }
+        QEMUOutputInner { bytes_written: 0 }
     }
 
     /// Send a character.
     fn write_char(&mut self, c: char) {
-        unsafe {
-            core::ptr::write_volatile(0x3F20_1000 as *mut u8, c as u8);
+        let mut buffer = [0u8; 4]; // char can be up to 4 bytes
+        let sequence = c.encode_utf8(&mut buffer);
+        for b in sequence.bytes() {
+            unsafe {
+                core::ptr::write_volatile(0x3F20_1000 as *mut u8, b);
+            }
         }
 
-        self.chars_written += 1;
+        self.bytes_written += sequence.len();
     }
 }
 
@@ -110,7 +114,7 @@ impl console::interface::Write for QEMUOutput {
 }
 
 impl console::interface::Statistics for QEMUOutput {
-    fn chars_written(&self) -> usize {
-        self.inner.lock(|inner| inner.chars_written)
+    fn bytes_written(&self) -> usize {
+        self.inner.lock(|inner| inner.bytes_written)
     }
 }
diff --git a/04_safe_globals/src/console.rs b/04_safe_globals/src/console.rs
index 658cf66..32d7b1b 100644
--- a/04_safe_globals/src/console.rs
+++ b/04_safe_globals/src/console.rs
@@ -21,7 +21,7 @@ pub mod interface {
     /// Console statistics.
     pub trait Statistics {
         /// Return the number of characters written.
-        fn chars_written(&self) -> usize {
+        fn bytes_written(&self) -> usize {
             0
         }
     }
diff --git a/04_safe_globals/src/main.rs b/04_safe_globals/src/main.rs
index 82262ea..9056770 100644
--- a/04_safe_globals/src/main.rs
+++ b/04_safe_globals/src/main.rs
@@ -126,11 +126,11 @@ mod synchronization;
 unsafe fn kernel_init() -> ! {
     use console::interface::Statistics;
 
-    println!("[0] Hello from Rust!");
+    println!("[0] Hello from Rust! 🦀");
 
     println!(
         "[1] Chars written: {}",
-        bsp::console::console().chars_written()
+        bsp::console::console().bytes_written()
     );
 
     println!("[2] Stopping here.");

Hi, thanks for sharing this.

To be honest, I am not planning to change to UTF-8 for the serial for now. A debug serial is supposed to be a very low overhead vehicle to transport debug information, so having to potentially transmit multiple bytes for a single character is a bit problematic.

Also, I don't think that the receiver side expects UTF-8 from these ancient interfaces.

I hope you understand.