jugglerchris/rust-html2text

unable to use ANSI sequences with from_read_with_decorator

doums opened this issue · 4 comments

doums commented

Hi,

I'm trying to use from_read_with_decorator with my own TextDecorator to output in the terminal with some color and text style. Unfortunately from_read_with_decorator seems to remove some part of escape sequences eg. \e, preventing for creating nice terminal output.

I use termion for colors and style

impl TextDecorator for ContentDecorator {
    type Annotation = RichAnnotation;

    fn decorate_link_start(&mut self, url: &str) -> (String, Self::Annotation) {
        self.0.push(url.to_string());
        (
            format!(
                "{}{}{}* {}{}",
                Italic,
                Fg(Black),
                self.0.len() + 1,
                StyleReset,
                Fg(Blue)
            ),
            RichAnnotation::Link(url.to_string()),
        )
    }
// ...

Then

let output = from_read_with_decorator(html.as_bytes(), term_width, ContentDecorator(vec![]))

output:

HTML: Google has <a href="https://blog.chromium.org/2021/01/limiting-private-api-availability-in.html">announced</a> that they are going to block

 from_read_with_decorator output:
"Google has [3m[38;5;0m2* [m[38;5;4mannounced[39m that they are going to block

Hi,

The short answer is that the decorator isn't designed to pass things like ANSI escapes though - it's intended for plain text "decoration" (e.g. displaying <em>foo</em> as *foo*). For example, the decorations are passed through the line wrapping process which wouldn't work very well for ANSI escapes.

The way to get colour is to instead use from_read_rich, which returns annotated text spans. The html2term example program (cargo run --example html2term -- foo.html) uses this, also with termion. (It's interactive - q to exit)

I can see that it would be convenient to have an API more like the decorator API which can work terminal escapes. Perhaps a from_read_coloured(...) which takes a mapping similar to html2term::top::to_style).

doums commented

Hi,

Ok, if I may, you should specify that from_read_with_decorator does not support ANSI escape sequences (among others) in the doc. Currently, it's ambiguous because in TextDecorator doc we can read

Allow decorating/styling text.

When I read this, one of the first things I thought was "ok cool let's add some color and style with ANSI codes".

The point with from_read_rich is that you have to deal with a vector of TaggedLines and parse this yourself. Which is not convenient. Compared to from_read_with_decorator.

I can see that it would be convenient to have an API more like the decorator API which can work terminal escapes. Perhaps a from_read_coloured(...) which takes a mapping similar to html2term::top::to_style).

I can confirm. Something exposed like from_read_with_decorator with the same system : by providing some kind of TextDecorator would be great !

And thank you for providing us this little crates ! Thanks for your work. It's a nice lib 👍🏻

Hi,
I've tried to clarify the documentation added an experimental from_read_coloured on branch issue_43_ansi_seq. You'd need to add the ansi_colours feature too.

The html2text example program has a --colour option which exercises the new function - so cargo run --example html2text --features ansi_colours -- --colour foo.html.

I'm not totally happy with the API - constructing a pair of Strings for every span of text isn't ideal - I'm open to better ideas! But is this along the right lines?

Regards,

jugglerchris

I've merged that branch with the experimental from_read_coloured() function.