/html2xhtml

Command-line HTML to XHTML converter

Primary LanguageCOtherNOASSERTION

Html2xhtml is a command-line tool that converts HTML files to XHTML
files. The path of the HTML input file can be provided as a command-
line argument. If not, it is read from stdin.

Xhtml2xhtml tries always to generate valid XHTML files.  It is able to
correct many common errors in input HTML files without loose of infor‐
mation.  However, for some errors, html2xhtml may decide to loose some
information in order to generate a valid XHTML output.  This can be
avoided with the -e option, which allows html2xhtml to generate non-
valid output in these cases.

Html2xhtml can generate the XHTML output compliant to one of the fol‐
lowing document types: XHTML 1.0 (Transitional, Strict and Frameset),
XHTML 1.1, XHTML Basic and XHTML Mobile Profile.


HOW TO RUN THE PROGRAM
-----------------------------------------------------------------------

For full information about how to run the program
see doc/html2xhtml.txt in the source code distribution,
the html2xhtml.txt file in the Windows binaries ZIP file
or the html2xhtml manpage. Some examples are shown below.

- By default, the program reads the input file from its standard input
and dumps the output file to its standard output:

cat input.html | html2xhtml

- The input can also be specified as a command line argument:

html2xhtml input.html

- In order to save the output to a file, redirect the standard output:

html2xhtml input.html >output.html

- Alternatively, you can specify the output file name with the -o option:

html2xhtml input.html -o output.html

- Select the document type of the output with -t:

html2xhtml input.html -t 1.1 -o output.html

The available values are:

transitional: XHTML 1.0 Transitional
frameset: XHTML 1.0 Frameset
strict: XHTML 1.0 Strict
1.1: XHTML 1.1
basic-1.0: XHTML Basic 1.0
basic-1.1: XHTML Basic 1.1
mp: XHTML Mobile Profile
print-1.0: XHTML Print 1.0

Use "transitional" if you just want to tidy up the markup.

- Choose an output character encoding (by default, the program uses
the character encoding detected in the input):

html2xhtml input.html --ocs utf-8 -o output.html

Get the list of available character sets:

./src/html2xhtml --lcs


HOW TO COMPILE AND INSTALL THE PROGRAM FROM THE SOURCE TARBALL
-----------------------------------------------------------------------

Enter the main directory of the source distribution and type:

$ ./configure
$ make

You can run the test battery in order to check that the program is
working as expected:

$ cd tests
$ ./test.sh
$ cd ..

If you want to install the program in your system, type then (it may
require root priviledges):

$ make install

See ./INSTALL for more information.

The program has been tested to compile on GNU/Linux and MinGW in Windows.
In MinGW the actual EXE file to use is the one the compiler creates
inside src\.libs instead of the one in src\. It depends on the
libiconv-2.dll file, which is distributed with MinGW
(inside the bin\ subdirectory of the main MinGW installation directory).


HOW TO COMPILE AND INSTALL THE PROGRAM FROM THE GIT SOURCES
-----------------------------------------------------------------------

The source code in the Git repository does not include the files
generated by the autotools. In order to build the ./configure script,
run the following commands from the main directory of the sources:

$ aclocal
$ libtoolize
$ touch config.rpath
$ autoheader
$ automake --add-missing
$ autoconf

In OS X you need to use the glibtoolize command instead of libtoolize.

After that, you should get the ./configure script and proceed as
explained above:

$ ./configure
$ make