/utf8conditioner

Historical - Early 2000's hack to "fix" bad codes in UTF-8 encoded XML so that XML parsers will be able to parse it

Primary LanguageCGNU General Public License v2.0GPL-2.0

utf8conditioner
---------------

Takes UTF-8 input from stdin, writes processed UTF-8 to stdout
and errors/warnings to stderr. Developed as a tool for checking and
pre-processing UTF-8 encoded XML responses to OAI-PMH requests
(see http://www.openarchives.org/).

This program aims to `fix' bad codes in UTF-8 encoded XML so that 
XML parsers will be able to parse it (albeit with some corruption 
introduced by substitution of dummy characters in place of illegal 
codes). Even though data may need to be discarded as unreliable it
may still be possible to extract any <resumptionToken> element to
see if a harvest was complete or not.


COMPILING

This program has been developed on Linux using gcc but should compile 
with on other systems with an ANSI C compiler.

On Linux with gcc:

> make	                        [should build utf8conditioner]

> make test                     [will run several tests, should all say PASS]

> ./utf8conditioner -h    	[will display help]

To use another compiler, change the CC = gcc line in the Makefile.


Simeon Warner, simeon@cs.cornell.edu
$Id: README,v 1.2 2005/10/25 23:18:23 simeon Exp $