drudru/ansi_up

Parse backspace character?

Closed this issue · 8 comments

We're using ansi_up to parse console output from some of our build tools (thanks for the great lib!), and some of them (e.g. Composer) use backspace when streaming output.

This takes place on a single line, so we end up with a line that looks like:

  - Installing �[32mcomposer/installers�[39m (�[33mv1.6.0�[39m): Downloading (�[33mconnecting...�[39m)���������������������������Downloading (�[33m0%�[39m)           ���������������������������Downloading (�[33m5%�[39m)����������������Downloading (�[33m40%�[39m)�����������������Downloading (�[33m45%�[39m)�����������������Downloading (�[33m50%�[39m)�����������������Downloading (�[33m85%�[39m)�����������������Downloading (�[33m90%�[39m)�����������������Downloading (�[33m95%�[39m)�����������������Downloading (�[33m100%�[39m)

(That's the ASCII backspace character after each update.)

Is it possible for ansi_up to resolve these backspace characters, or should we implement this ourselves atop ansi_up?

It looks like the parsing needs to take place after ANSI parsing, as doing it at the string level does not account for the ANSI control characters being unprintable.

Will think about this issue. Brb with thoughts

The log output looks like it needs some 'next level' functionality.

Essentially, we would need to emulate a teletype/terminal. Usually commands that deliver this type of output run the isatty() libc call to see if they are on a terminal. If your build logs are being run under a pseudo-tty, then the logger will think that it can output this kind of output.

Ansi-up cannot process this. However, it is interesting to think about the problem.

If we wanted to support this, we would need to 'kind of' emulate a terminal. I say kind of because these loggers typically do not move the cursor to arbitrary positions. They treat the terminal as a teletype. For example, once a newline is sent, that 'line' is complete. We might be able to use this to our advantage in order to simplify the problem.

If I were to implement this functionality, I would create another library that would use ansi-up.
The library would process the input and return completed lines and an in-process line.
The caller of the library would take completed lines and just append them to the HTML inside the PRE tag. It would take the in-process line and replace the contents with the new contents received.

The other-library would take the output from ansi-up and watch for carriage-returns, back-space, and newlines.

Out of curiosity, how are you processing your log data with ansi-up right now?

Dead issue. Closing.

Sorry for taking a while to get back to you here; happy new year :)

The problem with processing it at a different level is that it only affects the content, not the state. So, something like Foo{yellow}{backspace}bar should produce Fo{yellow}bar. It's not possible to do this without understanding the ANSI codes, so you can't really do it at a higher level without reparsing all the ANSI codes yourself.

The core problem is that the byte stream contains two sets of data really: content, and operations on the "state" (i.e. the colour information). The backspace character only operates on the content.

We're already taking the output of ansi_up and performing operations on it, but backspaces might operate over HTML element boundaries, so it's not a good place to perform this operation. (For example Foo{yellow}{backspace}bar produces Foo<span>{backspace}bar</span>.)

FWIW, I don't think it's necessary to support all the cursor ANSI commands, but \b is something we see in practise with actual build tools we're using.

Out of curiosity, how are you processing your log data with ansi-up right now?

ansi_up is used in a dashboard which displays build logs from our various systems, built in React.

We have an existing output stream which emits data, which may be full lines or partial lines. We parse this data out into an array of lines, and then use ansi_up's streaming support to parse each line (reinitialising the parser each time we receive a backend update).

Essentially, in React, we have:

function Log( props ) {
    const lines = props.data.split( '\n' );
    const parser = new AnsiUp();

    return (
        <div>
            { lines.map( ( line, index ) => (
                <div
                    key={ index }
                    dangerouslySetInnerHTML={ { __html: parser.ansi_to_html( line ) } }
                />
            ) ) }
        </div>
    )
}

(We have a bunch of other output we need to add, including line numbers, hence the manual splitting and parsing.)

Ok, thanks.

The reason that I ask is that I haven't seen backspace used much in practice.

What I typically see is CR (carriage return).

For example, lets say you are displaying a progress indicator. You display the line without a newline on the end.
When you need to update that line. You send a CR and then the contents of the new line.

I'm wondering if this is a particular library used by your build tools or if this is some artifact of a full-blown curses implementation.

Another thing to consider... you might run the build tools in such a way so that they do not detect a tty. A good example of this is the 'git' command. It runs the isatty() libc call to see if it is on a real tty or if it is sending its output to a pipe.

Just brainstorming a little bit.

Yeah, it's a fair point, I haven't seen backspace used much either.

In this case, it's Composer, the package manager for PHP, so we don't have control over it.

Each line looks something like:

  - Installing composer/installers (v1.6.0): Downloading (40%)

I believe the idea is that instead of using a carriage return and having to rewrite the package name on each update, it deletes only the characters necessary to overwrite that final percentage part. Cool idea, but also a pain for parsing.

There's some discussion over on composer/composer#3612 about this.

Of note, the ANSI parser referenced in that thread supports backspace characters.