
Java Utility for Class Hidden Markov Models and Extensions

JUCHMME, an acronym for Java Utility for Class Hidden Markov Models and Extensions is a tool developed for biological sequence analysis.

For documentation, see http://docs.google.com/viewer?a=v&pid=sites&srcid=Y29tcGdlbi5vcmd8bGFifGd4OjJkNWM5ZDRkYWIwYzQ0YTA

The overall aim of this work has been to develop a software tool of capable of offering a large collection of standard algorithms for Hidden Markov Models (HMMs) as well as a number of extensions and to evaluate this model on various biological problems. The JUCHMME framework is characterized by: Flexibility: Ease of use and customization for various problems. The user can create models of any architecture, any alphabet (DNA, protein or other), all without requiring programming capabilities (settings will be made through a configuration file).

Training methods: JUCHMME integrates a wide range of training algorithms for HMM for labeled sequences. This kind of models are often called “class HMMs” and are commonly trained by the Maximum Likelihood (ML) criterion to model within-class data distributions. The tool has been developed to support the Baum-Welch algorithm [1-3] and its extension necessary to handle labeled data [4]. Other alternatives are also supported, namely the gradient-descent algorithm proposed by Baldi and Chauvin [5] and the Viterbi training (or else “segmental k-means”) [6]. Additionally, the Conditional Maximum Likelihood (CML) criterion, which corresponds to discriminative training, is also supported. The CML training can be performed only with gradient based algorithms, and to this end a fast and robust algorithm for individual learning rate adaptation has been implemented [7]. The same algorithm is available for training the Hidden Neural Networks (HNN, see below).

Decoding: It integrates a wide range of decoding algorithms such as Viterbi, N–Best [8], posterior–Viterbi [9] and Optimal Accuracy Posterior Decoder [10]. Moreover, decoding of partially labeled data is offered with all algorithms in order to allow incorporation of experimental information [11].

Training Procedures: It contains built-in model creation and evaluation procedures, such as options for independent test, self-consistency test, jacknife test, k–fold cross-validation and early stopping. All the prediction algorithms also incorporate the corresponding reliability measures that have been proposed [12] (correlation coefficient, Q, SOV).

HMM Extensions: To overcome HMM limitations, a number of extensions have been developed or developed such as segmental k–means both for Maximum Likelihood (ML) and for Conditional Maximum Likelihood (CML) [6], Hidden Neural Networks [13], models that condition on previous observations [14] and a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially-labeled data (semi–supervised learning) [15].

Getting started

JUCHMME is an executable file in Java that is executed from the command line. JUCHMME is written in Java and requires a 32-bit or 64-bit Java runtime environment version 7 or later, freely available from http://www.java.org. The Windows and MacOS X installers contain a suitable Java runtime environment that will be used if a suitable Java runtime environment cannot be found on the computer.

Download the program from http://www.compgen.org/tools/juchmme or Github https://github.com/pbagos/juchmme.

If you find JUCHMME useful in your research, please consider citing the reference that describes this work:

JUCHMME: a Java Utility for Class Hidden Markov Models and Extensions for biological sequence analysis. Tamposis A. Ioannis, Tsirigos D. Konstantinos , Theodoropoulou C. Margarita, Kontou I. Panagiota, Tsaousis N. Giorgos, Sarantopoulou Dimitra, Litou I. Zoi and Pantelis G. Bagos. PMID: 31250907 (https://www.ncbi.nlm.nih.gov/pubmed/31250907)


javac -XDignore.symbol.file -sourcepath src/ -d ./bin src/hmm/Juchmme.java

javac -XDignore.symbol.file -sourcepath src/ -d ./bin src/hmm/RandomSeq.java

javac -XDignore.symbol.file -sourcepath src/ -d ./bin src/nn/Main.java

Command Line

The juchmme program is controlled by a list of command-line argument options. The following options control this:

-V: print JUCHMME version and exit

-a: the free emission parameter file. This parameter file is required.

-e: the free transition parameter file

-i: the input sequence three-line file. This file stores the input sequences for decoding or training algorithms in a three-line format.

-f: the input sequence FASTA file. This file stores the input sequences for decoding algorithms in a fasta format.

-A: the input Multiple Sequence Alignment FASTA file. This file stores the input Multiple sequence Alignment (MSA) in for decoding algorithms in one line FASTA format

-m: the model file. This parameter file is required.

-x: the HNN encoding file

-t: Training option

-c: the configuration file

-v cluster size: k–fold cross-validation mode using an integer larger than 0 for cluster size (for instance clusterSize=175)

-k number of clusters: k–fold cross-validation mode using an integer larger than 0 for k (for instance k=10)

-s: self-consistency test

-j: jacknife test

-p: show plot

-P: graph plot Directory


