tats/w3m

w3mman does not render ansi escape sequences on redhat based distributions

albfan opened this issue · 10 comments

OS: Fedora 34
package: w3m-0.5.3-50.git20210102.fc34.x86_64

Man pages add ansi escape sequences for bold

$ PAGER='cat -A' /usr/bin/man bash
BASH(1)                                                                                                  General Commands Manual                                                                                                  BASH(1)$
$
^[[1mNAME^[[0m$
       bash - GNU Bourne-Again SHell$
$
^[[1mSYNOPSIS^[[0m$
       ^[[1mbash ^[[22m[options] [command_string | file]$
$
^[[1mCOPYRIGHT^[[0m$
       Bash is Copyright (C) 1989-2020 by the Free Software Foundation, Inc.$
$
^[[1mDESCRIPTION^[[0m$
...

On bash it shows correctly:

$ /usr/bin/man bash
BASH(1)                                                                                                  General Commands Manual                                                                                                  BASH(1)

NAME
       bash - GNU Bourne-Again SHell

SYNOPSIS
       bash [options] [command_string | file]

COPYRIGHT
       Bash is Copyright (C) 1989-2020 by the Free Software Foundation, Inc.

DESCRIPTION
...

But w3mman do not render those correctly:

$ /usr/bin/w3mman bash
BASH(1)                                                                                                  General Commands Manual                                                                                                  BASH(1)

 [1mNAME [0m
       bash - GNU Bourne-Again SHell

 [1mSYNOPSIS [0m
        [1mbash  [22m[options] [command_string | file]

 [1mCOPYRIGHT [0m
       Bash is Copyright (C) 1989-2020 by the Free Software Foundation, Inc.

 [1mDESCRIPTION [0m
...

Any settings I'm missing? I see this working correctly on arch linux

Please change the title to w3mman does not render ansi escape sequences on redhat based distributions

That's a more accurate description of what's going on.

/usr/local/libexec/w3m/cgi-bin/w3mman2html.cgi man > man.html

generates proper html on other platforms, but on redhat and friends the resulting output contains escape codes.

I compiled man-db like it's compiled on arch, and I get exactly the same problem...

Arch linux.

/usr/lib/w3m/cgi-bin/w3mman2html.cgi man >man.html
$ cat -A man.html | head
Content-Type: text/html$
$
<html>$
<head><title>man man</title></head>$
<body>$
<pre>$
MAN(1)                                                            Utilidades de paginador del manual                                                           MAN(1)$
$
<b>NOMBRE</b>$
       man - interfaz de los manuales de referencia del sistema$

Fedora:

/usr/libexec/w3m/cgi-bin/w3mman2html.cgi man > man.html
$ cat -A man.html | head
Content-Type: text/html$
$
<html>$
<head><title>man man</title></head>$
<body>$
<pre>$
MAN(1)                                                                                             Utilidades del paginador del manual                                                                                             MAN(1)$
$
^[[1mNOMBRE^[[0m$
       man - interfaz de los manuales de referencia del sistema$

I started adding substitute commands:

diff --git i/w3mman2html.cgi w/w3mman2html.cgi
index b121470..0fa90f5 100755
--- i/w3mman2html.cgi
+++ w/w3mman2html.cgi
@@ -162,7 +162,15 @@ EOF
     next;
   }
 
-  s@[1m(\w+)[0m$@<b>$1</b>@g;
+  my $printchar='[\wÁÉÍÓÚáéíóú /\'.:;,&()\\"~=%*\$\?|!#\`\@\{\}\<\>_-]';
+  s@[1m($printchar+)[0m@<b>$1</b>@g;
+  s@[4m($printchar+)[24m@<u>$1</u>@g;
+  s@[1m($printchar+)[0m@<b>$1</b>@g;
+  s@[1m($printchar+)[22m@<b>$1</b>@g;
+  s@[1m($printchar+)[4m@<b>$1</b>@g;
+  s@[22m($printchar+)[0m@<u>$1</u>@g;
+  s@[22m($printchar+)[24m@<u>$1</u>@g;
+  s@[4m([\wÁÉÍÓÚáéíóú /'.:;,&()\\"~=%*\$\?|!#\`\@\{\}\<\>_-]+)[0m@<u>$1</u>@g;
   s@(http|ftp)://[\w.\-/~]+[\w/]@<a href="$&">$&</a>@g;
   s@\b(mailto:|)(\w[\w.\-]*\@\w[\w.\-]*\.[\w.\-]*\w)@<a href="mailto:$2">$1$2</a>@g;
   s@(\W)(\~?/[\w.][\w.\-/~]*)@$1 . &file_ref($2)@ge;

This almost do it. I test with man bash and still there are some errors. Basically we need anything that is a character. instead of all that

[\wÁÉÍÓÚáéíóú /'.:;,&()\"~=%*$?|!#`@{}<>_-]

Please commit #238
Thanks!

I have found one still, [34m in dbus-run-session(1)

rkta commented

So finally setting missed is

GROFF_NO_SGR=1

8891eab...760d7ad

Wonder if we should reopen this and consider a parameter to configure depending on distro. Or this just force same behaviour in all distros?

tats commented

I assume adding GROFF_NO_SGR=1 has no problem with

  • groff >=1.18 with Debian default
  • groff >=1.18 default
  • groff <1.18, or
  • non-groff.

I don't assume SGR is forcely enabled even when GROFF_NO_SGR=1.

Anyway, if you really found a problem, please reopen.