25/4/14 (kl2) The viv.pl script executes a set of commands and manages the flow of data between them, currently by using a combination of files and named pipes (FIFOs). It is configured with a JSON file which describes a connected directed graph. The format of the config is described below and there are some sample config files in the examples directory for reference. Usage ===== viv.pl [-s] [-x] [-v <verbose_level>] [-o <logname>] <config.json> Flags: -s : strict; that is, fail if any of the executed commands exit with a non-zero status -x : execute; by default, the script will just parse the config file and report in the log what processes it would have created -v <verbosity_level> : specify how chatty the log messages should be. Currently, verbosity levels range from 0 to 3 -o <logname> : specify the log file name (default stdout) config.json : a JSON formatted file specifying a directed graph. The config is a hash array with two keys: 1) "nodes" - a list of nodes, which are hash arrays with keys: "id" - a unique identifier for the node, used in the edges to specify the "from" and "to" nodes "type" - possible values INFILE, OUTFILE, RAFILE and EXEC (see below for more detail) "name" - for a file node {INFILE, OUTFILE, RAFILE}, this specifies the name of the file. "cmd" - for an EXEC node, specifies the command to be executed. Port names can specified (currently these are arbitrary embedded strings which are replaced by FIFO names generated at execution time) 2) "edges" - a list of edges, which are hash arrays with keys: "id" - a unique identifier for the edge "from" - contains an id value for a node, followed by an optional ':' and a port name. When a port name is specified, occurrences of that port name in the EXEC node are replaced by the names of FIFOs generated by the script to direct I/O between the nodes "to" - contains an id value for a node, followed by an optional ':' and a port name. When a port name is specified, the same substitution process described above (under "to") is applied. Node types, attributes and behaviour ==================================== INFILE specifies a file on the file system for reading OUTFILE specifies a file on the file system for writing RAFILE specifies an intermediate file on the file system for reading and writing. Used to move data between EXEC nodes when it is decided not to use the default pipe behaviour. When this node type is used, downstream nodes will not be launched until execution of any directly upstream nodes has completed. This node can have a "subtype" attribute with value "DUMMY" to indicate that that viv script should not take responsibility for creating the file (via output redirection), it should just coordinate execution of the connected EXEC nodes. EXEC specifies a command to be executed. The actual command appears in the "cmd" attribute". Direct data transfer between EXEC nodes is done via named pipes (fifos), which are automatically created by the script. Input/output defaults to stdin/stdout respectively, unless the "to" or "from" attributes include a port specification. TODO ==== 1. Improve logging. There is currently only one log file, it might be useful to have separate execution logs for each EXEC node, or at least clearer separation and labelling in the master log file. 2. Validate the graph. Currently there is little checking to see if the graph specification makes sense. For example, if a node tries to use an input or output port in another node, no checks are done to see if such a port actually exists. No checks are done to ensure that nodes specified in edges actually exist. The id for a node should also be unique for a given graph, but this is not checked. There is also no check to make sure that the graph is connected, so it is easy to specify orphan EXEC nodes which wait forever for input. So for the time being, be careful.