This directory contains a PADS Inference System for inferring a description
of newline-separated ASCII data.  

1. Requirements:
   You must have sml/nj 110.64 or later, available from www.smlnj.org.
   You must have PADS 2.00 or later, available from www.padsproj.org.

2. You must set the LEARN_HOME environment variable to the root of the
   PADS Inference System distribution.

3. To compile the learning program, just type "make" in the $LEARN_HOME
   directory. This will compile the source files in the src directory
   and put the sml/nj heap image in the directory lib.  The script
   that invokes the system is in the scripts directory. The first time
   you compile the system, it may take anywhere from 1 mins to 10 mins
   depending on the speed of your processor. This is due to the compilation
   of fairly large regular expressions specified in the token definition. 
   You might want to add $LEARN_HOME/scripts to you search path.

4. The "learn" program takes a data file input and produces an 
   intermediate representation (IR) and also prints out the PADS
   description. For details of the usage of "learn", type:
   
   $LEARN_HOME> scripts/learn --help 

   PADS Learning System 1.0
   learn  [-d <string>] [-n <string>] [-maxdepth <int>] [-lineNos] 
   [-ids] [-h <float>] [-s <float>] [-noise <float>] [-a <int>] 
   [-ma <int>] [-j <float>] [-e] [-lex <string>] [-au <string> ...] 
   [--help] files...

   -d      output directory (default gen/)
   -n      name of output file (default generatedDescription)
   -maxdepth       maximum depth for exploration (default 50)
   -lineNos        print line numbers in output contexts (default false)
   -ids    print ids in type and tokens matching base types (default false)
   -h      histogram comparison tolerance (percentage, default 0.01)
   -s      struct determination tolerance (percentage, default 0.1)
   -noise  noise level (percentage, default 0.0)
   -a      array width requirement (default 4)
   -ma     minimum array width (default 0)
   -j      junk threshold (percentage, default 0.1)
   -e      Print entropy tokens (default false)
   -lex    prefix of the lex config to be used (default "vanilla")
   -au     run only the golden file

   For example, 

   >scripts/learn examples/data/crashreporter.log

   will generate in the infer/gen directory a Ty which contains the IR, 
   and crashreporter.log.p which is the PADS description. In addition,
   a number of tools avilable to this data source in the form of .c
   files are generated in the same directory. For data source XYZ,
   XYZ-accum.c is the accumulator tool, XYZ-xml.c is the XML tool, 
   XYZ-fmt.c is the formatting tool, XYZ-graph is the grapher too etc. 
   For example, to build the accumulator tool for crashreporter.log, do:
   
   >cd gen/
   >make crashreporter.log-accum

   To build the xml tool, do:
   >make crashreporter.log-xml

   To build the fmt tool, do:
   >make crashreporter.log-fmt

   Executable programs crashreporter.log-accum and crashreporter.log-xml
   will be created in directory gen/ARCH, where ARCH is a string that
   represents the OS and the CPU architecture, such as darwin.ppc and 
   linux.i386.

   To run the accumulator, xml-conversion, and formatting programs,
   >ARCH/crashreporter.log-accum ../examples/data/crashreporter.log
   >ARCH/crashreporter.log-xml ../examples/data/crashreporter.log
   >ARCH/crashreporter.log-fmt ../examples/data/crashreporter.log

   The grapher tool is a generated Perl script. To use the
   grapher tool, you need to have a working gnuplot (you can get a copy
   from http://www.gnuplot.info). To run the grapher, you first make 
   the formatting tool by doing:
   >make crashreporter.log-fmt 
   as above

   and then execute
   >./crashreporter.log-graph 

   to see detailed usage of the grapher. And example of use is:
   >./crashreporter.log-graph -d ../examples/data/crashreporter.log \
    -x 2 -y 5 -s impulses -t %H:%M:%S

5. The directory examples contains the subdirectory data with many
   sample data sources.  The example directory contains a README file
   explaining how to run the inference tool on these data source and
   put the results in the results directory, one sub-directory per
   data source.