Unicode handling issues

Multi-char unicode characters inside code blocks seem to be improperly handled.

MRE

```sh
# 🙂
```

Running mdt file.md results in a panic:

The application panicked (crashed).
  byte index 4 is not a char boundary; it is inside '🙂' (bytes 3..7) of `
  # 🙂
  `
in src/nodes/textcomponent.rs, line 379
thread: main

The problem

md-tui/src/nodes/textcomponent.rs

Lines 375 to 389 in 046d3be

    
           for (i, c) in word.content().chars().enumerate() { 
        
               if c == '\n' { 
        
                   end = i; 
        
                   let new_word = 
        
                       Word::new(word.content()[start..end].to_string(), word.kind()); 
        
                   inner_content.push(new_word); 
        
                   start = i + 1; 
        
                   final_content.push(inner_content); 
        
                   inner_content = Vec::new(); 
        
               } else if i == word.content().len() - 1 { 
        
                   let new_word = 
        
                       Word::new(word.content()[start..].to_string(), word.kind()); 
        
                   inner_content.push(new_word); 
        
               } 
        
           }

Here, str::chars iterates by unicode characters, which is the kind of index your start and end refer to. However, the string slice syntax is not UTF8-aware and instead indexes by bytes. A UTF8-encoded unicode character is very often not 1 byte, so word.content()[start..end] and word.content()[start..] are semantically incorrect.

I think the simplest way to fix this is to accumulate the number of bytes using char::len_utf8 and use that as start and end. Although there's probably a cleaner way to write the whole blob.

Also I'm not sure if there are other instances of similar mistakes within the codebase. Maybe it's worth double checking.

Versions

0.7.3 (from ArchLinux repository) and 0.7.4 (from crates.io)

Hi. Thanks for the issue. Came across this yesterday myself. ~~I naively thought tree-sitter would give back indexes on the char boundaries, but as you found out as well. It doesn't.~~ Should not be an issue elsewhere, as I don't do many (any?) operations on single chars.

It's fixed. I want to fix the list alignment issue before I push out a new version

	for (i, c) in word.content().chars().enumerate() {
	if c == '\n' {
	end = i;
	let new_word =
	Word::new(word.content()[start..end].to_string(), word.kind());
	inner_content.push(new_word);
	start = i + 1;
	final_content.push(inner_content);
	inner_content = Vec::new();
	} else if i == word.content().len() - 1 {
	let new_word =
	Word::new(word.content()[start..].to_string(), word.kind());
	inner_content.push(new_word);
	}
	}