Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS), is a PERL based pipeline designed to improve functional predictions of uncharacterized sequences for any CAZyme or CBM family currently maintained on the CAZy website or within user-defined datasets.
When using SACCHARIS please site the following paper:
Jones DR, Thomas DK, Alger N, Ghavidel A, Inglis GD, Abbott DW. SACCHARIS: An automated pipeline to inform discover of new carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets. Biotechnology for Biofuels, 11 (2018), p. 27, DOI: 10.1186/s13068-018-1027-x.
This software is distributed under the terms of the GPL, version 2 or later, excepting that:
- The third party programs and scripts used by SACCHARIS are covered by the terms of their respective licenses
With this package I have included a copy of:
- fasta_rmSmall.pl
- This script screens a fasta file and will remove sequence data where length of sequence is smaller than a user definined minimum length
- Perl Libraries
- Bio::Seq, Bio::SeqIO
- Date::Calc
- File::chdir
- GetOpt::Long
- HTML::TagParser (see note below)
- List::Util
- LWP::Simple
- Threads
- Third Party Software
- dbCAN
- HMMER 3.1
- MUSCLE
- ProtTest 3
- RAxML
- FastTree (Requires version 2.1.10 or greater)
- Fasta_subsample.pl
- Notices
- If you experience an error pertaining to an uninitialized $esearch value, confirm you have the following packages installed
- libwww-perl (linux)
- LWP::Protocol::https (OSX)
- If you experience an error pertaining to an uninitialized $esearch value, confirm you have the following packages installed
- Install all Requirements
- Clone Repository
git clone
- Copy Scripts to a location in the PATH
- Download HMMER
- Extract archive
- Copy or Move folder to
/usr/local/hmmer
- Add binaries directory to your Path
mkdir /usr/local/dbcan
- Download dbCAN
- dbCAN-fam-HMMs.txt
- hmmscan-parser.sh
- Format HMM db
hmmpress dbcan-fam-HMMs.txt
- Clone Repository
git clone
- Move directory to
/usr/local/prottest3
- Download MUSCLE
- Copy
muscle
binary to location in the path like/usr/local/bin
- Install as per directions Here
- Clone Repository
git clone
- Follow Directions in
README
to create Executables - Move executables to a location in the Path
- Script is included with SACCHARIS
- Script was written by Timothy L. Bailey and William Noble
- Script is part of the MEME Suite
- Tagparser.pm throws an error on line 236 - Fix - Alter Line to
bless $self, ref($package) || $package;
- NCBI E-Utilities Registration is Required for running of the cazy_extract.pl script
- Steps Involved
- Send email to eutilities@ncbi.nlm.nih.gov that includes the desired values for email address and tool name
- eg. tool = SaccharisTool, email = your.name@domain
- Create and account on NCBI (https://www.ncbi.nlm.nih.gov/account/) in the Settings page create and API Key
- Uncomment and Modify Lines 31-33 of cazy_extract.pl - use the information from steps 1 and 2 to modify the script
- Send email to eutilities@ncbi.nlm.nih.gov that includes the desired values for email address and tool name
- Steps Involved
- cazy_extract.pl will not run without this information
- Run the following in a terminal window with <insert_here> replaced with your base file name
perl -pe 's/\>/$& . U . sprintf("%08d", ++$n) . " "/ge' <insert_here>.fasta > <insert_here>_mod.fasta
- In terminal follow Usage as given by
Saccharis.pl
; orSaccharis.pl -m