Listen incoming directory for new files, process them and put processing results to the output directory.
Uses inoticoming based on inotify lib. Mincer provides a few useful functions, not found in the inoticoming tool:
- more friendly and convenient command line interface allow to start and stop Mincer daemon;
- simple but flexible configuration file with a lot of reasonable defaults;
- can save processed files (optionally gzipping them) in a separate directory;
- improved logging of what is going on within the processor.
Stacking multiple instances of Mincer allows to build
complex queued data processing conveyers. To do so, just configure
your first Mincer instance OUTPUT_DIR
to the INCOMING_DIR
of the
second one.
The package doesn't provide a system daemon but only a few scripts which can be used to build your own daemons.
- Create a root directory for your Mincer instance, say
/var/mincer1
; - Create a callback script
callback.sh
inside the root directory. Take acallback.sh.skeleton
as a start point. It must take an absolute file name of incoming data file as first argument. It must create result file (or files) inside its current directory (actually this can be overrided. See examples); - Create your own Mincer configuration file based on
mincer.conf.example, provided inside the upstream tarball.
You have to define at least
ROOT_DIR
andFILENAME_REGEXP
variables. Save the created file as/var/mincer1/mincer.conf
; - Start the Mincer daemon with
mincer /var/mincer1/mincer.conf
. Note only the very first command will start the daemon and all subsequent invocations will gracefully exit.
Now you can:
- add your data files to the
/var/mincer1/incoming
directory; - examine what is going on in the
/var/mincer1/processing.log
; - stop the daemon with
mincer /var/mincer1/mincer.conf stop
.
- files are processed one-by-one;
- when incoming queue is empty, file processing begins immediately after write close;
- during moving processing results to output directory (or directories) mincer moves files in two stages: first it copies them to $NAME.tmp, then renames $NAME.tmp to $NAME. This allow to deliver all or nothing;
- write your
FILENAME_REGEXP
in such way to honor special meaning of '.tmp' extension, otherwise mincer will start processing files earlier than it should. It means you never use regexps like '.*' or '.tmp$'. - callback script is always started in configured
WORK_DIR
. All files inWORK_DIR
are removed before callback start, but directories are not. So you can save some state between callback invocation; - callback script always invoked with the one argument - absolute path of the incoming file;
- callback script should never move, change or remove incoming file because it will break mincer processing logic;
- empty incoming files are not processed and are immediately removed and callback is not started;
- incoming files without read permissions are immediately moved to
the
FAILED_DIR
and callback is not started; - all configuration options except
ROOT_DIR
,FILENAME_REGEXP
,LOGFILE
andINCOMING_DIR
can be safely changed in runtime without any additional efforts; - if you want to change one of
ROOT_DIR
,FILENAME_REGEXP
,LOGFILE
,INCOMING_DIR
, you must stop running mincer instance first, then change the variables and then start mincer again; - symlinks are NOT processed. If you want to process files located somewhere in the filesystem, use hardlinks or tiny files with target file path.
Examine mincer.conf.example
for further information about available
configuration options.