/uproc

Tools for ultra-fast protein sequence classification.

Primary LanguageCGNU Lesser General Public License v3.0LGPL-3.0

UProC Version 1.2.0 2015-03-09 README

Please send bug reports, suggestions or questions to uproc@gobics.de or create an issue at UProC on GitHub.

UProC uses semantic versioning. This means that the version number has the format MAJOR.MINOR.PATCH and an increment in one of the parts has the following meaning:

  1. MAJOR: The user interface (input, output and/or command-line arguments) of the programs or the format of the supplementary files was changed in a backwards-incompatible manner.
  2. MINOR: Some functionality was added.
  3. PATCH: One or more bugs were fixed.

If MINOR or PATCH are incremented, all modifications are backwards-compatible. See the semantic versioning homepage for details.

The code for the UProC executables is available under the GNU GPLv3 (see the COPYING file). However, most of the functionality is separated into a library called libuproc, using the GNU LGPLv3 (see the COPYING.LESSER file).

Please visit the UProC homepage to make sure you're using the latest release. If you are interested in contributing, please get a clone of UProC on GitHub and base your work off the devel branch.

We currently offer binary packages for Windows XP and later, for 32 bit (i686) and 64 bit (x86-64, also known as amd64) architectures. To install, just download and extract the corresponding zip file. Currently there is no graphical user interface, you need to run UProC from a PowerShell or cmd.exe.

Please note that UProC is primarily developed for (and works best on) GNU/Linux.

UProC uses the GNU Autotools, so installation should work on any system that has a POSIX compatible /bin/sh, make and a C99 compiler.

For additional features, UProC has the following optional dependencies:

zlib (highly recommended)
Used for On-the-fly gzip (de-)compression of files. This is probably already installed on your computer, but you need to make sure that the development headers are also available.
OpenMP
UProC uses OpenMP for parallelization, if the compiler suports it. The configure script automatically detects whether this is the case.
Doxygen (for developers, version 1.8.0 or later recommended)
If Doxygen is available, it wil be used to generate the the API documentation of libuproc (which is also available at the UPRoC homepage.

To install UProC from a source tarball, simply extract it and run the commands

./configure
make
make install  # requires appropriate privileges in PREFIX (see below)

By default, the executables are built in a way that you can omit the make install step and just run them from wherever you wish or copy them to a desired location (it is recommended to utilize the --prefix option mentioned below, though).

You can pass the following options (and more) to the configure script:

--prefix=PREFIX
 Install all files below PREFIX (see the Installation Names section the INSTALL file for details).
--disable-openmp
 Do not use OpenMP (i.e. disable multi-threading). This will usually cause longer running times. You probably don't want this.
--disable-shared
 Link all execututables statically and do not build and install libuproc as a shared library.
--enable-mmap Enable the mmap database format. This is enabled by default only on GNU/Linux, as it can have a negative impact on the performance on other OSes (we experienced this while running UProC on OSX 10) if the database is stored on a HDD. If you have an SSD, enabling this feature can reduce startup time significantly.

See the INSTALL file for a more detailed description of the installation process for projects using GNU Autotools.

On a fresh clone of the git repo, run autoreconf -i to generate the configure script and then follow the instructions in the previous section.

UProC needs certain files at runtime. These files are split into two categories, usually available as two distinct directories in the file system.

The database consists of files representing a set of known protein subsequences that map to given families, e.g. extracted from PFAM.

There are two ways to obtain a database:

  1. You can download a database from the UProC homepage and import it with the uproc-import program.
  2. Alternatively, you can create your own database with the uproc-makedb program.

Detailed instructions for these programs can be found by passing the -h option when running them.

The model consists of files containing certain parameters that are not tied to a particular database. You can download the newest model files from the UProC homepage.

UProC consists of the following command-line programs:

uproc-prot
Protein sequence classifier.
uproc-dna
DNA/RNA sequence classifier.
uproc-orf
Command-line interface to the ORF translation mechanism used by uproc-dna.
uproc-import
Import database.
uproc-export
Export database.
uproc-makedb
Create a new database.

You can pass the -h option to find out how they are used.