/femto

Sequence Indexing and Search

Primary LanguageC++GNU Lesser General Public License v3.0LGPL-3.0

ABOUT

FEMTO is an indexing and search system for queries on sequences of bytes. FEMTO stands for the FM-index for External Memory with Throughput Optimizations. This tool supports building large indexes in parallel with MPI and then searching large indexes with a multithreaded server.

REQUIREMENTS

FEMTO requires a 64-bit machine to build and test. 32-bit machines are supported for search only. FEMTO is known to build with GCC for Linux/x86-64.

To build FEMTO from a release tarball, you will need a C++ compiler, libssl-dev, and optionally MPI. When building from source, you will also need flex, bison, autotools, and libtool. It has worked with GNU bison 2.5 and 2.4.1.

MPI is required for parallel index construction. Note that MPI runs across machines of different endianness are not supported.

If you'd like to use MPI-parallel index construction, you'll need to install a version of MPI which supports threads. We have used OpenMPI 1.8.8, configured in the following way:

 ./configure --prefix=/opt/openmpi1.8.8 --enable-mpirun-prefix-by-default --enable-mpi-thread-multiple --with-threads
 make
 make install # on all compute nodes
 # To make sure mpirun and mpicc are in the path for use with FEMTO
 export PATH=$PATH:/opt/openmpi1.8.8/bin
 export LD_LIBRARY_PATH=/opt/openmpi1.8.8/lib

COMPILATION

Make sure you have met the requirements first!

We recommend starting with a FEMTO release tarball from https://github.com/femto-dev/femto/releases .

If you prefer to use a source checkout, there are additional build dependencies.

If you're starting out with a source checkout as with

 git clone https://github.com/femto-dev/femto.git
 cd femto

you will need to also generate the configure script:

 sh autogen.sh

To build FEMTO, issue the following commands:

 ./configure
 make

You'll see lots of warnings that things are declared/defined but not used; this is normal and not a problem. If you get errors and compilation fails, you may not have all the required dev libraries installed. (e.g. if it is running g++ and fails to find -lssl, that would indicate you need to install libssl)

To run the included unit tests, use

 make check

INSTALLATION

To install FEMTO in a particular place, be sure to include --prefix in your configure line, as in

 ./configure --prefix ~/femto_install

As usual,

 make install

will install the FEMTO tools to the destination specified by ./configure.

You can also run the commands from the build directory.

See src/mod_femto/README for information about installing the FEMTO apache module.

USAGE

To build an index, run

 femto/src/dcx_cc/femto_index --tmp /path/to/tmp_dir \
                             --outfile index.femto \
                             files_or_directories_to_index

Then, to query the index, use femto_search. To count the number of occurrences (quickly!), use:

 femto/src/main_cc/femto_search /path/to/index_dir --count 'pattern'

To report documents that matched (time depends on # reported), use:

 femto/src/main_cc/femto_search /path/to/index_dir 'pattern'

To report documents and offsets that matched (time depends on # reported), use:

 femto/src/main_cc/femto_search /path/to/index_dir --offsets 'pattern'

To learn more about what kinds of patterns you can use, see femto/src/main/QUERY_FORMAT.txt

THIRD PARTY ACKNOWLEDGEMENTS

FEMTO source includes the Google RE2 package, jQuery, jQuery SlickGrid, and jQuery SVG.