INSTALLATION ------------ # Requirements # 1. Operating system: * LINUX / UNIX 2. Requisite softwares: * MATLAB (v6.5 or later) * Bayes Net Toolbox (BNT); can be downloaded from http://bnt.sourceforge.net/ 3. Optional software: * NCBI BLAST suite; can be downloaded from http://www.ncbi.nlm.nih.gov/ NOTE: If you have not installed BLAST locally, the way to use DBNN is: running BLAST and constructing your own PSSM file somewhere else (e.g. online BLAST), and then provide the PSSM file to DBNN. # Compilation # Type the following commands in the home directory of DBNN to compile and install the package: make vi rundbnn ( modify the first FOUR variables in the rundbnn file, i.e. homedir, matlabdir, blastdir, and dbname, according to your own configurations and exit ) ./rundbnn # the first run to check # the installation of requisite # softwares If you want to launch DBNN in directories other than the home directory of DBNN, you need to place (or make a symbol link of) the rundbnn program in a directory under the search path of the system, and add the home directory of DBNN into the search path of MATLAB. RUN --------- # Quick start # The simplest way to use DBNN is typing: ./rundbnn seq.fasta output where seq.fasta is your amino acid sequences saved in FASTA format, and output is the prefix of prediction files (typically, two prediction files will be generated: output.fasta and output.raw the former contains the predictions in FASTA format, and the latter contains scores of secondary structures for each residue site). An additional file generated by launching rundbnn is query.pssm which is the PSSM file corresponding to your seq.fasta. By default, rundbnn reads parameters from eight files associated with DBNN package: default.M1.mat default.M2.mat default.M3.mat default.M4.mat default.nn1.lin default.nn1.sig default.nn2.lin default.nn2.sig You can re-train the DBNN (see below) and provide your own parameter files when launching rundbnn. For example, your parameter files are named with prefix "newpara" (i.e. newpara.M1.mat, ... newpara.nn1.lin, ...), you can give them to rundbnn by typing: ./rundbnn seq.fasta output newpara If you have not installed BLAST, and instead you have run BLAST and constructed the PSSM file somewhere else (e.g. online), you can use DBNN by typing: ./rundbnn yourPSSM output [ prefix-of-parameter-files ] # Run DBN and NN separately # 1. Run DBN type: ./dbnpred PSSM output [ prefix-of-parameter-files ] In using dbnpred, you should always provide the PSSM file and the prefix of the parameter files explicitly. Two files will be generated: output.fasta output.raw where output.fasta contains the predicted secondary structure saved in FASTA format, and output.raw contains the predicted posterior probabilities distribution for each residue. 2. Run NN type: ./nnpred PSSM output parameters Also two files, output.fasta and output.raw will be generated, and this time output.raw contains the raw outputs of the neural network. # Training of DBN and NN # 1. Training of DBN type: ./dbntrain PSSM secstr.fasta parameters where secstr.fasta is the annotation of secondary structure for your proteins (in FASTA format) and "parameters" is a user-defined prefix for parameter files that will be generated. Four files will be generated: parameters.M1.mat parameters.M2.mat parameters.M3.mat parameters.M4.mat 2. Training of NN type: ./nntrain PSSM secstr.fasta parameters Four files will be generated: parameters.nn1.lin parameters.nn1.sig parameters.nn2.lin parameters.nn2.sig ================= Xin-Qiu Yao Jan. 7, 2007