r3bl-org/r3bl-open-core

[tui][examples] Example #5, fix editor component, typing more than 6 "#" produces janky output

Closed this issue ยท 8 comments

In the top level folder of the r3bl-open-core repo, run ./run.nu run. Then select 5 and press enter.

This is the r3bl-cmdr demo. When you press ###### you will see some strange artifacts displayed on the screen. This might have something to do w/ a recent commit done in the last month that fixed the markdown parser to handle headings that are greater than 6 "#" marks ๐Ÿ˜„

tui-rc-demo-bug-2023-11-18_19.59.13.mp4

Even more simply when you type _this is not italic as the only text in the editor component, it breaks. It displays the content as italic.

@e0lithic @nazmulidris

Test cases to make the parser break ๐Ÿคธ:

  • _ or * at the start or end or in the middle of a word.
  • To break the heading add an extra "#" (7 hashes break the heading).

#215

The parser for headings might be conflicting w/ the parsers for bold and italic.

image

headings.mp4
star.and.underscore.mp4

@e0lithic It doesn't look like there are issues w/ the headings. However it does look like italic parsing (and bold parsing) have been broken since the beginning.

Just typing _this should not be italic will break it. The screenshot below shows this. It should not be rendered as italic, since there is not closing _.

image

@e0lithic This test highlights some of the issues:

#[cfg(test)]
mod tests {
    use crossterm::style::Stylize;
    use r3bl_rs_utils_core::*;

    use super::*;

    #[test]
    fn fix_italic() {
        let input = ["_this should not be italic"].join("\n");
        let (remainder, blocks) = parse_markdown(&input).unwrap();
        println!("{:?}", remainder);
        println!("{:?}", blocks);
    }
}

Output from test:

"_this should not be italic"
List { items: [] }

Observation regarding usage of alt in parsers.

alt returns the last error, hence if none of the matches work, then the last delimiters error is thrown. Hence the output for parse_elemenet_italic and other parsers using alt will have different results different errors depending on the order of the parser.

This testcase should clarify it .

    #[test]
    fn test_delimiter(){
        let parser = parse_element_italic;
        let mut parserStar = delimited(tag(ITALIC_1), is_not(ITALIC_1), tag(ITALIC_1));
        let mut parserUnderscore = delimited(tag(ITALIC_2), is_not(ITALIC_2), tag(ITALIC_2));
        assert_eq2!(
            parser("*here is italic"),
            Err(NomErr::Error(Error {
                input: "*here is italic",
                code: ErrorKind::Tag
            }))
        );
        assert_eq2!(
            parserStar("*here is italic"),
            Err(NomErr::Error(Error {
                input: "",
                code: ErrorKind::Tag
            }))
        );
        assert_eq2!(
            parserUnderscore("*here is italic"),
            Err(NomErr::Error(Error {
                input: "*here is italic",
                code: ErrorKind::Tag
            }))
        );
    }

Additional fixes are required to ensure that the special characters are only recognised at word boundaries. Following examples should be treated as generic text.

test_ing_
_test_ing

Code smell emanating from parse_element_plaintext().

I got that code from another MD parser, and I thought it was fishy at the time as well

https://github.com/r3bl-org/r3bl-open-core/blob/main/tui/src/tui/md_parser/parse_element.rs#L107 (edited)

My best guess is that it is getting anychar that is not one of the *, _, -, etc special chars.

  • We already have the tag parsers for special characters. There is no need to use these. Not at least for the ones which have dedicated parsers running prior to this
  • This is code that was inherited a long time ago that who knows what it does, and everything else got rewritten around it
  • It kind of makes sense now why _this is not italic was crapping out, since that strange function was checking for _ as an invalid character!

There are changes we have made to the markdown spec. Here's a full list of the extras that we need: https://github.com/r3bl-org/r3bl-open-core/blob/main/tui/src/tui/md_parser/parser.rs#L39

We support most of the standard constructs. And one thing we diverge from radically is SMART LISTS

This was a huge effort ... to make it so that we can track indentation levels across line breaks ... this diverges from markdown spec, since it is a block level construct. We do something similar for code blocks, so we can parse the code block contents separately and then syntax highlight them too!

Extras:

  1. tags list,
  2. authors list,
  3. title value,
  4. date value,
  5. smart list

Pictures of smart lists which look at multi-line elements as block elements. This is a divergence from markdown spec, which is mostly single line scoped elements.

Image

Image

@nazmulidris A similar issue can be reproduced as follows :

issue

@e0lithic This just got fixed in this commit 16ae6e4 which is in main.