This directory contains a PADS Inference System for inferring a description of newline-separated ASCII data. 1. Requirements: You must have sml/nj 110.64 or later, available from www.smlnj.org. You must have PADS 2.00 or later, available from www.padsproj.org. 2. You must set the LEARN_HOME environment variable to the root of the PADS Inference System distribution. 3. To compile the learning program, just type "make" in the $LEARN_HOME directory. This will compile the source files in the src directory and put the sml/nj heap image in the directory lib. The script that invokes the system is in the scripts directory. The first time you compile the system, it may take anywhere from 1 mins to 10 mins depending on the speed of your processor. This is due to the compilation of fairly large regular expressions specified in the token definition. You might want to add $LEARN_HOME/scripts to you search path. 4. The "learn" program takes a data file input and produces an intermediate representation (IR) and also prints out the PADS description. For details of the usage of "learn", type: $LEARN_HOME> scripts/learn --help PADS Learning System 1.0 learn [-d <string>] [-n <string>] [-maxdepth <int>] [-lineNos] [-ids] [-h <float>] [-s <float>] [-noise <float>] [-a <int>] [-ma <int>] [-j <float>] [-e] [-lex <string>] [-au <string> ...] [--help] files... -d output directory (default gen/) -n name of output file (default generatedDescription) -maxdepth maximum depth for exploration (default 50) -lineNos print line numbers in output contexts (default false) -ids print ids in type and tokens matching base types (default false) -h histogram comparison tolerance (percentage, default 0.01) -s struct determination tolerance (percentage, default 0.1) -noise noise level (percentage, default 0.0) -a array width requirement (default 4) -ma minimum array width (default 0) -j junk threshold (percentage, default 0.1) -e Print entropy tokens (default false) -lex prefix of the lex config to be used (default "vanilla") -au run only the golden file For example, >scripts/learn examples/data/crashreporter.log will generate in the infer/gen directory a Ty which contains the IR, and crashreporter.log.p which is the PADS description. In addition, a number of tools avilable to this data source in the form of .c files are generated in the same directory. For data source XYZ, XYZ-accum.c is the accumulator tool, XYZ-xml.c is the XML tool, XYZ-fmt.c is the formatting tool, XYZ-graph is the grapher too etc. For example, to build the accumulator tool for crashreporter.log, do: >cd gen/ >make crashreporter.log-accum To build the xml tool, do: >make crashreporter.log-xml To build the fmt tool, do: >make crashreporter.log-fmt Executable programs crashreporter.log-accum and crashreporter.log-xml will be created in directory gen/ARCH, where ARCH is a string that represents the OS and the CPU architecture, such as darwin.ppc and linux.i386. To run the accumulator, xml-conversion, and formatting programs, >ARCH/crashreporter.log-accum ../examples/data/crashreporter.log >ARCH/crashreporter.log-xml ../examples/data/crashreporter.log >ARCH/crashreporter.log-fmt ../examples/data/crashreporter.log The grapher tool is a generated Perl script. To use the grapher tool, you need to have a working gnuplot (you can get a copy from http://www.gnuplot.info). To run the grapher, you first make the formatting tool by doing: >make crashreporter.log-fmt as above and then execute >./crashreporter.log-graph to see detailed usage of the grapher. And example of use is: >./crashreporter.log-graph -d ../examples/data/crashreporter.log \ -x 2 -y 5 -s impulses -t %H:%M:%S 5. The directory examples contains the subdirectory data with many sample data sources. The example directory contains a README file explaining how to run the inference tool on these data source and put the results in the results directory, one sub-directory per data source.