TextCleanup is developed to clean HTML from a .html file and output raw text without HTML tags into a .txt file.

To execute, run the command

perl HTMLToCleanTxt.pl

in Terminal to create raw text files to work with.

You can name the files you want to process as arguments:

HTMLToCleanTxt.pl MyFile01.html MyFile02.html

…or a glob of all HTML files in the directory:

HTMLToCleanTxt.pl *.html

If only a single argument is given, and it's a text file, it's assumed to contain a list of files to be processed:

HTMLToCleanTxt.pl FilesToProcess.txt

Full instructions for working with this script in Perl: http://ow.ly/Vf09E

Contributors: ltagliaferri, Marcus Smith (https://github.com/carwash)