============================================= | Embedded Relational Learning | ============================================= This program produces SVG graphics describing the relationships between queries and interpretations in a relational learning setting. Prerequisites --------------------------------------------- for the c++ program - a current c++ compiler (>=4.1) - boost >= 1.35 - for boost.ublas to work, you need to install the LAPACK backend. - boost-numeric-bindings This is a header-only library you can get from: http://mathema.tician.de/software/boost-bindings just drop it in /usr/local/include/boost-numeric-bindings - matio for matlab output for the xml to svg conversion script - perl >= 5.1 - The Perl modules SVG, Convert::Color, File::Util. Building --------------------------------------------- $ tar xzvf erl.tar.gz $ cd erl $ mkdir build $ cd build $ cmake ../src $ make -j2 Installing --------------------------------------------- not done yet, just run from build directory. Running --------------------------------------------- try: $ ./erl --help $ ./erl {action} [options] there are two main actions: - count this is a Molfea implementation for SDF files count does currently not compile on 64bit systems due to known bugs in boost. - code this is a specific RPROP-based CODE implementation for visualization purposes. Configuration File --------------------------------------------- You can put all options in the configuration file as well. The configuration file defaults to config.dat and can be changed using the -c parameter. Configuration files have a INI-syntax. The options translate as follows: the option: --count.out_base="something" becomes [count] out_base = something Examples -- count --------------------------------------------- you need SDF-files (the LONG format!!!) the output of the count stage is cached, therefore you need to specify "-r" if you want the program to re-count and re-read everything. an example call could be: $ ./erl count --count.out_base=base --SDFReader.files molecules_a.sdf:1,molecules_i.sdf:0 --output-dir=`pwd` -r -v Explanation: This count action generates two csv-files in output-dir, base.csv and base-names.csv needed for the code action (see below). It uses two SDF-files, molecules_a.sdf and molecules_i.sdf, with class 1/0, respectively. The output files (and the cache) is written to the current directory, re-counting is forced. The program is verbose. Examples -- code --------------------------------------------- you need two files: the matrix file and the names file. matrix file - contains the description of an interpretation in each line - a line consists of - a name for the interpretation (currently ignored) - for every query, a number of co-occurrences (usually 1/0) - a class label (should be an int) names file - contains a list of names for the queries, one per line. use this name scheme for the files: Matrix file: BASENAME.csv Names file: BASENAME-names.csv you can then run erl using the following command $ ./erl code --code.input_file BASENAME --output-dir=`pwd` currently, the output-dir parameter is _mandatory_. good luck!