Schedule: Using containers in bioinformatics
Workshop: Singularity Containers for Bioinformatics | Introduction to Containers | Basics of singularity | Build a contenair image | video1: Contenainerising a pipeline | video2: Setting Up Graphical Applications | video3: Building Containers
conda singularity installation
# Install singularity on Centos 7
# Install Dependencies
$ sudo yum update -y && \
sudo yum groupinstall -y 'Development Tools' && \
sudo yum install -y \
openssl-devel \
libuuid-devel \
libseccomp-devel \
wget \
squashfs-tools \
cryptsetup
# Install go
Do like
$ export VERSION=1.15.6 OS=linux ARCH=amd64 && \
wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz && \
sudo tar -C /usr/local -xzvf go$VERSION.$OS-$ARCH.tar.gz && \
rm go$VERSION.$OS-$ARCH.tar.gz
or
#-- What I did
wget https://golang.org/dl/go1.15.6.linux-amd64.tar.gz
sudo tar -C /usr/local go1.15.6.linux-amd64.tar.gz
rm go1.15.6.linux-amd64.tar.gz
# set path
echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
source ~/.bashrc
# Download
$ export VERSION=3.7.0 && # adjust this as necessary \
wget https://github.com/hpcng/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz && \
tar -xzf singularity-${VERSION}.tar.gz && \
cd singularity
# compile and install
./mconfig && \
make -C ./builddir && \
sudo make -C ./builddir install
# Configure
sudo sed -i 's/^ *mount *home *=.*/mount home = no/g' /home/kplee/program/singularity/builddir/singularity.conf
echo ". /home/kplee/program/singularity/builddir/etc/bash_completion.d/singularity" >> $(eval echo ~"$USER")/.bashrc
#####---https://github.com/Dfam-consortium/TETools for running instruction using singularity
$ curl -sSLO https://github.com/Dfam-consortium/TETools/raw/master/dfam-tetools.sh
$ chmod +x dfam-tetools.sh
$ ./dfam-tetools.sh
WAIT FOR THE SIF FILE CREATION THEN RUN
BuildDatabase --help
RepeatModeler --help
RepeatMasker --help
runcoseg.pl --help
Should work
try to use this code for your run
$ BuildDatabase -name genome1 genome1.fa
$ RepeatModeler -database genome1 [-LTRStruct] [-pa 8]
$ RepeatMasker genome1.fa [-lib library.fa] [-pa 8]
$ runcoseg.pl -d -m 50 -c ALU.cons -s ALU.seqs -i ALU.ins
or use the docker container:
docker pull quay.io/biocontainers/maker:
(see *maker/tags*
for valid values for )
cd /home/kplee/program/maker
singularity pull docker://quay.io/biocontainers/maker:2.31.10--pl526h61907ee_17
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob 929ec067dd64 done
Copying blob 6d9e7a283257 done
Copying blob d483744c9fb6 done
Copying config 69f4134944 done
Writing manifest to image destination
Storing signatures
2020/12/22 23:11:41 info unpack layer: sha256:929ec067dd64d4ee237275e2222e04358db954ee38be44fb29100330d280bd6c
2020/12/22 23:11:43 warn rootless{usr/local/man} ignoring (usually) harmless EPERM on setxattr "user.rootlesscontainers"
2020/12/22 23:11:43 warn rootless{dev/full} creating empty file in place of device 1:7
2020/12/22 23:11:43 warn rootless{dev/zero} creating empty file in place of device 1:5
2020/12/22 23:11:43 warn rootless{dev/tty} creating empty file in place of device 5:0
2020/12/22 23:11:43 warn rootless{dev/null} creating empty file in place of device 1:3
2020/12/22 23:11:43 warn rootless{dev/random} creating empty file in place of device 1:8
2020/12/22 23:11:43 warn rootless{dev/urandom} creating empty file in place of device 1:9
2020/12/22 23:11:43 info unpack layer: sha256:6d9e7a2832576734b34ee23343008d6fe444844cb3d25321a7452389c40f24cb
2020/12/22 23:11:44 info unpack layer: sha256:d483744c9fb62de8a93fcce07e536abe5f289b377b474fe70a4c4e8ec8440473
INFO: Creating SIF file...
## check
ls
got this
maker_2.31.10--pl526h61907ee_17.sif
So we have the image of maker in our system
let us try to run it
singularity exec ./maker_2.31.10--pl526h61907ee_17.sif maker --help
I got
Possible precedence issue with control flow operator at /usr/local/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
MAKER version 2.31.10
Usage:
maker [options] <maker_opts> <maker_bopts> <maker_exe>
Description:
MAKER is a program that produces gene annotations in GFF3 format using
evidence such as EST alignments and protein homology. MAKER can be used to
produce gene annotations for new genomes as well as update annotations
from existing genome databases.
The three input arguments are control files that specify how MAKER should
behave. All options for MAKER should be set in the control files, but a
few can also be set on the command line. Command line options provide a
convenient machanism to override commonly altered control file values.
MAKER will automatically search for the control files in the current
working directory if they are not specified on the command line.
Input files listed in the control options files must be in fasta format
unless otherwise specified. Please see MAKER documentation to learn more
about control file configuration. MAKER will automatically try and
locate the user control files in the current working directory if these
arguments are not supplied when initializing MAKER.
It is important to note that MAKER does not try and recalculated data that
it has already calculated. For example, if you run an analysis twice on
the same dataset you will notice that MAKER does not rerun any of the
BLAST analyses, but instead uses the blast analyses stored from the
previous run. To force MAKER to rerun all analyses, use the -f flag.
MAKER also supports parallelization via MPI on computer clusters. Just
launch MAKER via mpiexec (i.e. mpiexec -n 40 maker). MPI support must be
configured during the MAKER installation process for this to work though
Options:
-genome|g <file> Overrides the genome file path in the control files
-RM_off|R Turns all repeat masking options off.
-datastore/ Forcably turn on/off MAKER's two deep directory
nodatastore structure for output. Always on by default.
-old_struct Use the old directory styles (MAKER 2.26 and lower)
-base <string> Set the base name MAKER uses to save output files.
MAKER uses the input genome file name by default.
-tries|t <integer> Run contigs up to the specified number of tries.
-cpus|c <integer> Tells how many cpus to use for BLAST analysis.
Note: this is for BLAST and not for MPI!
-mpi Forces MAKER to use MPI (overriding MPI autodetection).
Useful if MAKER doesn't correctly detect that it was
launched using an MPI process manager, or if running
in a Singularity container.
-force|f Forces MAKER to delete old files before running again.
This will require all blast analyses to be rerun.
-again|a recaculate all annotations and output files even if no
settings have changed. Does not delete old analyses.
-quiet|q Regular quiet. Only a handlful of status messages.
-qq Even more quiet. There are no status messages.
-dsindex Quickly generate datastore index file. Note that this
will not check if run settings have changed on contigs
-nolock Turn off file locks. May be usful on some file systems,
but can cause race conditions if running in parallel.
-TMP Specify temporary directory to use.
-CTL Generate empty control files in the current directory.
-OPTS Generates just the maker_opts.ctl file.
-BOPTS Generates just the maker_bopts.ctl file.
-EXE Generates just the maker_exe.ctl file.
-MWAS <option> Easy way to control mwas_server for web-based GUI
options: STOP
START
RESTART
-version Prints the MAKER version.
-help|? Prints this usage statement.
seems that we have bioperl issues
I will install it in the system using
sudo yum install perl-devel
sudo yum install cpanminus
sudo cpanm Bio::Perl # to install bioperl tools
sudo cpanm --force Bio::Perl # force failed installation module
singularity exec ./maker_2.31.10--pl526h61907ee_17.sif maker -CTL
got the control files
[kplee@localhost maker]$ ll
total 884144
-rwxrwxr-x. 1 kplee kplee 905347072 Dec 22 23:14 maker_2.31.10--pl526h61907ee_17.sif
-rw-rw-r--. 1 kplee kplee 1413 Dec 27 00:30 maker_bopts.ctl
-rw-rw-r--. 1 kplee kplee 1214 Dec 27 00:30 maker_exe.ctl
-rw-rw-r--. 1 kplee kplee 4458 Dec 27 00:30 maker_opts.ctl
Test with real datasets
cd /home/kplee/analysis/0002_mr4003_maker_annotation
singularity exec /home/kplee/program/maker/maker_2.31.10--pl526h61907ee_17.sif maker -CTL
/home/kplee/analysis/0001_mr4003_repeat_analysis/sesamiMR4003.contigs.fasta.masked
/home/kplee/datafiles/003_fos.MR4003.configs/original/transcripts.fa
/home/kplee/datafiles/003_fos.MR4003.configs/original/lyc.fasta
singularity exec /home/kplee/program/maker/maker_2.31.10--pl526h61907ee_17.sif maker -cpus 20 -base mr4003_rnd1 round1_maker_opts.ctl maker_bopts.ctl maker_exe.ctl
got this
Possible precedence issue with control flow operator at /usr/local/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
STATUS: Parsing control files...
df: Warning: cannot read table of mounted file systems: No such file or directory
ERROR: Could not determine if RepBase is installed
--> rank=NA, hostname=localhost.localdomain
####### SO now I WANT TO INSTALL some other images for the second annotation pipeline including PASA, GeneMarkHMM, FGENESH, Augustus, and SNAP, GlimmerHMM GeneWise EVM
PASA tuto 3 octobre 2020 turo pasa
cd /home/kplee/program/pasa
singularity pull docker://quay.io/biocontainers/pasa:2.4.1--he1b5a44_0
log file
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob a3ed95caeb02 done
Copying blob b0dc45cd432d done
Copying blob 9466b3513669 done
Copying blob ddd482ea7b54 done
Copying blob 4d69f833b9d8 done
Copying blob e7c454e5167d done
Copying blob e38092b005c0 done
Copying blob a3ed95caeb02 skipped: already exists
Copying blob a3ed95caeb02 skipped: already exists
Copying blob f879b42dfe2b done
Copying blob a3ed95caeb02 skipped: already exists
Copying blob 30194d518e49 done
Copying config e3fe1b11c0 done
Writing manifest to image destination
Storing signatures
2020/12/23 00:38:09 info unpack layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
2020/12/23 00:38:09 info unpack layer: sha256:b0dc45cd432d14fb6df7d3239dc15d09c63906f8e7bfd373a4647b107fc3746c
2020/12/23 00:38:09 warn rootless{dev/console} creating empty file in place of device 5:1
2020/12/23 00:38:09 info unpack layer: sha256:9466b3513669459396338a74673ede0166f534ab64923f66ecca58176d1ffe5e
2020/12/23 00:38:09 info unpack layer: sha256:ddd482ea7b54727ff2b6748188290b75f6441ba4091d15a5e62d2a0ed47c81dd
2020/12/23 00:38:09 info unpack layer: sha256:4d69f833b9d8db2f02cc784f9bab7317317e4062129cafe65660c9eb2cd1c115
2020/12/23 00:38:09 info unpack layer: sha256:e7c454e5167ddab5101c53d104c90c0a037c3ee3930b9a74a14da40c852134cc
2020/12/23 00:38:09 info unpack layer: sha256:e38092b005c08dab4ae80615d29944107dc3d07307e37c4558914998e0e61827
2020/12/23 00:38:09 info unpack layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
2020/12/23 00:38:09 info unpack layer: sha256:f879b42dfe2b253544b0dc6834eff097a2a994cecda76b3c2950465fef9bfdfc
2020/12/23 00:38:09 info unpack layer: sha256:30194d518e4994c6b155e452a6a4d2a362ddecc40c2c4f24efb06e6def3e0de6
INFO: Creating SIF file...
ls #check the presence of the image
got
pasa_2.4.1--he1b5a44_0.sif
cd /home/kplee/program/EVM
singularity pull docker://quay.io/biocontainers/evidencemodeler:1.1.1--2
log
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob 1dbcab28ce46 done
Copying blob cfb1ba34637d done
Copying blob ace2d8a63dd5 done
Copying blob 75c080ef15eb done
Copying blob 316957f8baaf done
Copying blob dbd31e1d863d done
Copying blob 2f8531d5a6ec done
Copying blob 1dbcab28ce46 skipped: already exists
Copying blob 2e178fd72baf done
Copying blob fd59fdaa7753 done
Copying config dff695e838 done
Writing manifest to image destination
Storing signatures
2020/12/23 01:05:13 info unpack layer: sha256:1dbcab28ce46b65c0174e5e82658492107396fead31e9144c343e6bc96e471c7
2020/12/23 01:05:13 info unpack layer: sha256:cfb1ba34637d96787f6e780192bc5f09c04fe0b40c4e1acdcbc88953bce25b5b
2020/12/23 01:05:13 warn rootless{dev/console} creating empty file in place of device 5:1
2020/12/23 01:05:13 info unpack layer: sha256:ace2d8a63dd59063206661355f22da9e1c75c56d6cae5116d91db7a65f640667
2020/12/23 01:05:13 info unpack layer: sha256:75c080ef15ebb28daeadc91eeda0ca95a893fe1f4e55845da3feb58069d42fdc
2020/12/23 01:05:13 info unpack layer: sha256:316957f8baafcefd4c9f72b7e3f3fafb7fcb3ca805dc69d242aef656e0736490
2020/12/23 01:05:13 info unpack layer: sha256:dbd31e1d863d4a41565c08ad1d419c3a3538abfdd4a7971de63b8c5c8a43caec
2020/12/23 01:05:13 info unpack layer: sha256:2f8531d5a6ece34772cf99a2d30e3057c636f1ad05cdec0c8009eab401e616a7
2020/12/23 01:05:13 info unpack layer: sha256:1dbcab28ce46b65c0174e5e82658492107396fead31e9144c343e6bc96e471c7
2020/12/23 01:05:13 info unpack layer: sha256:2e178fd72baf28bc587b5d274574f1a3d3e25ac802fdaf154af258a3c5891831
2020/12/23 01:05:13 info unpack layer: sha256:fd59fdaa7753c713e42c24f11e061867f8bb0f2d1778bbb9bef208f8cdc13f4a
INFO: Creating SIF file...
singularity exec ./evidencemodeler_1.1.1--2.sif evidence_modeler.pl --help
log
WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C").
################# Evidence Modeler ##############################
--search_long_introns when set, reexamines long introns (can find nested genes, but also can result in FPs) (default: 0 (off))
--re_search_intergenic when set, reexamines intergenic regions of minimum length (can add FPs) (default: 0 (off))
--terminal_intergenic_re_search reexamines genomic regions outside of the span of all predicted genes (default: 10000)
#################################################################################################################################
Le reste ici
I went here
I fill the form and getre-orientated for downloading Genemark suite and genemark key
Now Installation in the server
gunzip gmes_linux_64.tar.gz
tar -xvf gmes_linux_64.tar
./check_install.bash
[kplee@localhost gmes_linux_64]$ ./check_install.bash
Checking GeneMark-ES installation
Can't locate Hash/Merge.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /
BEGIN failed--compilation aborted.
Error, Perl CPAN library Hash::Merge not found
So I need to install Hash/Merge missing dependencies
I did like this
sudo cpanm Hash::Merge
got this log
[sudo] password for kplee:
--> Working on Hash::Merge
Fetching http://www.cpan.org/authors/id/H/HE/HERMES/Hash-Merge-0.302.tar.gz ... OK
Configuring Hash-Merge-0.302 ... OK
==> Found dependencies: Clone::Choose
--> Working on Clone::Choose
Fetching http://www.cpan.org/authors/id/H/HE/HERMES/Clone-Choose-0.010.tar.gz ... OK
Configuring Clone-Choose-0.010 ... OK
==> Found dependencies: Test::Without::Module
--> Working on Test::Without::Module
Fetching http://www.cpan.org/authors/id/C/CO/CORION/Test-Without-Module-0.20.tar.gz ... OK
Configuring Test-Without-Module-0.20 ... OK
Building and testing Test-Without-Module-0.20 ... OK
Successfully installed Test-Without-Module-0.20
Building and testing Clone-Choose-0.010 ... OK
Successfully installed Clone-Choose-0.010
Building and testing Hash-Merge-0.302 ... OK
Successfully installed Hash-Merge-0.302
3 distributions installed
./check_install.bash
Got this
Checking GeneMark-ES installation
Can't locate Logger/Simple.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .).
BEGIN failed--compilation aborted.
Error, Perl CPAN library Logger::Simple not found
Try to solve this also
sudo cpanm Logger::Simple
Got this
[kplee@localhost gmes_linux_64]$ sudo cpanm Logger::Simple
--> Working on Logger::Simple
Fetching http://www.cpan.org/authors/id/T/TS/TSTANLEY/Logger-Simple-2.0.tar.gz ... OK
Configuring Logger-Simple-2.0 ... OK
==> Found dependencies: Test::Pod, Object::InsideOut
--> Working on Test::Pod
Fetching http://www.cpan.org/authors/id/E/ET/ETHER/Test-Pod-1.52.tar.gz ... OK
Configuring Test-Pod-1.52 ... OK
Building and testing Test-Pod-1.52 ... OK
Successfully installed Test-Pod-1.52
--> Working on Object::InsideOut
Fetching http://www.cpan.org/authors/id/J/JD/JDHEDDEN/Object-InsideOut-4.05.tar.gz ... OK
Configuring Object-InsideOut-4.05 ... OK
Building and testing Object-InsideOut-4.05 ... OK
Successfully installed Object-InsideOut-4.05
Building and testing Logger-Simple-2.0 ... OK
Successfully installed Logger-Simple-2.0
3 distributions installed
check again
./check_install.bash
got this
Checking GeneMark-ES installation
Can't locate Parallel/ForkManager.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .).
BEGIN failed--compilation aborted.
Error, Perl CPAN library Parallel::ForkManager not found
Try to solve this by runining
sudo cpanm Parallel::ForkManager
got this
[kplee@localhost gmes_linux_64]$ sudo cpanm Parallel::ForkManager
--> Working on Parallel::ForkManager
Fetching http://www.cpan.org/authors/id/Y/YA/YANICK/Parallel-ForkManager-2.02.tar.gz ... OK
Configuring Parallel-ForkManager-2.02 ... OK
==> Found dependencies: Moo::Role, Moo
--> Working on Moo::Role
Fetching http://www.cpan.org/authors/id/H/HA/HAARG/Moo-2.004004.tar.gz ... OK
Configuring Moo-2.004004 ... OK
==> Found dependencies: Role::Tiny, Sub::Defer, Sub::Quote, Class::Method::Modifiers
--> Working on Role::Tiny
Fetching http://www.cpan.org/authors/id/H/HA/HAARG/Role-Tiny-2.001004.tar.gz ... OK
Configuring Role-Tiny-2.001004 ... OK
Building and testing Role-Tiny-2.001004 ... OK
Successfully installed Role-Tiny-2.001004
--> Working on Sub::Defer
Fetching http://www.cpan.org/authors/id/H/HA/HAARG/Sub-Quote-2.006006.tar.gz ... OK
Configuring Sub-Quote-2.006006 ... OK
Building and testing Sub-Quote-2.006006 ... OK
Successfully installed Sub-Quote-2.006006
--> Working on Class::Method::Modifiers
Fetching http://www.cpan.org/authors/id/E/ET/ETHER/Class-Method-Modifiers-2.13.tar.gz ... OK
Configuring Class-Method-Modifiers-2.13 ... OK
Building and testing Class-Method-Modifiers-2.13 ... OK
Successfully installed Class-Method-Modifiers-2.13
Building and testing Moo-2.004004 ... OK
Successfully installed Moo-2.004004
Building and testing Parallel-ForkManager-2.02 ... OK
Successfully installed Parallel-ForkManager-2.02
5 distributions installed
check
Checking GeneMark-ES installation
Can't locate MCE/Mutex.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .).
BEGIN failed--compilation aborted.
Error, Perl CPAN library MCE::Mutex not found
solve
sudo cpanm MCE::Mutex
got this
--> Working on MCE::Mutex
Fetching http://www.cpan.org/authors/id/M/MA/MARIOROY/MCE-1.874.tar.gz ... OK
Configuring MCE-1.874 ... OK
Building and testing MCE-1.874 ... OK
Successfully installed MCE-1.874
1 distribution installed
check
do the same thing for Math::Utils,
check again
[kplee@localhost gmes_linux_64]$ ./check_install.bash
Checking GeneMark-ES installation
GeneMark.hmm eukaryotic 3
License key ".gm_key" not found.
This file is neccessary in order to use GeneMark.hmm eukaryotic 3.
Error, installation key is missing or expired
So I need to set the key
# extract the key file
zcat gm_key_64.gz > ~/program/GeneMark/gmes_linux_64/.gm_key
check again
./check_install.bash
I got this
Checking GeneMark-ES installation
All required components for GeneMark-ES were found
So It is okay Now. GeneMark suite and dependencies are well installed.
Let's check by runing a help command
./gmes_petap.pl
got this
# -------------------
Usage: ./gmes_petap.pl [options] --sequence [filename]
GeneMark-ES Suite version 4.62_lic
Suite includes GeneMark.hmm, GeneMark-ES, GeneMark-ET and GeneMark-EP algorithms.
Input sequence/s should be in FASTA format.
Select one of the gene prediction algorithm
To run GeneMark-ES self-training algorithm
--ES
To run GeneMark-ET with hints from transcriptome splice alignments
--ET [filename]; file with intron coordinates from RNA-Seq read splice alignment in GFF format
--et_score [number]; default 10; minimum score of intron in initiation of the ET algorithm
To run GeneMark-EP with hints from protein splice alignments
--EP
--dbep [filename]; file with protein database in FASTA format
--ep_score [number,number]; default 4,0.25; minimum score of intron in initiation of the EP algorithm
or
--EP [filename]; file with intron coordinates from protein splice alignment in GFF format
To run GeneMark.hmm predictions using previously derived model
--predict_with [filename]; file with species specific gene prediction parameters
To run ES, ET or EP with branch point model. This option is most useful for fungal genomes
--fungus
To run hmm, ES, ET or EP in PLUS mode (prediction with hints)
--evidence [filename]; file with hints in GFF format
Masking option
--soft_mask [number] or [auto]; default auto; to indicate that lowercase letters stand for repeats;
masks only lowercase repeats longer than specified length
In 'auto' mode length is adjusted based on the size of the input genome
Run options
--cores [number]; default 1; to run program with multiple threads
--pbs to run on cluster with PBS support
--v verbose
Optional sequence pre-processing parameters
--max_contig [number]; default 5000000; will split input genomic sequence into contigs shorter then max_contig
--min_contig [number]; default 50000 (10000 fungi); will ignore contigs shorter then min_contig in training
--max_gap [number]; default 5000; will split sequence at gaps longer than max_gap
Letters 'n' and 'N' are interpreted as standing within gaps
--max_mask [number]; default 5000; will split sequence at repeats longer then max_mask
Letters 'x' and 'X' are interpreted as results of hard masking of repeats
Optinal algorithm parameters
--max_intron [number]; default 10000 (3000 fungi); maximum length of intron
--max_intergenic [number]; default 50000 (10000 fungi); maximum length of intergenic regions
--min_contig_in_predict [number]; default 500; minimum allowed length of contig in prediction step
--min_gene_in_predict [number]; default 300 (120 fungi); minimum allowed gene length in prediction step
--gc_donor [value]; default 0.001; transition probability to GC donor in the range 0..1; 'auto' mode detects probability from training; 'off' switches GC donor model OFF
Developer options
--gc3 [number]; GC3 cutoff in training for grasses
--training to run only training step of algorithms; applicable to ES, ET or EP
--prediction to run only prediction step of algorithms using species parameters from previously executed training; applicable to ES, ET or EP
--usr_cfg [filename]; use custom configuration from this file
--ini_mod [filename]; use this file with parameters for algorithm initiation
--test_set [filename]; to evaluate prediction accuracy on the given test set
--key_bin
--debug
# -------------------
Awesome! Work fine!
Now set into the bin
# Go to home directory
cd
# pen the bash profie
vi .bash_profile
Put the full path in the Path VARIABLE
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HOME/program/GeneMark/gmes_linux_64
export PATH
press ESC + : + wq and then run
source .bash_profile
Test if it is working well
gmes_petap.pl
Got this result
# -------------------
Usage: /home/kplee/program/GeneMark/gmes_linux_64/gmes_petap.pl [options] --sequence [filename]
GeneMark-ES Suite version 4.62_lic
Suite includes GeneMark.hmm, GeneMark-ES, GeneMark-ET and GeneMark-EP algorithms.
Input sequence/s should be in FASTA format.
Select one of the gene prediction algorithm
To run GeneMark-ES self-training algorithm
--ES
To run GeneMark-ET with hints from transcriptome splice alignments
--ET [filename]; file with intron coordinates from RNA-Seq read splice alignment in GFF format
--et_score [number]; default 10; minimum score of intron in initiation of the ET algorithm
To run GeneMark-EP with hints from protein splice alignments
--EP
--dbep [filename]; file with protein database in FASTA format
--ep_score [number,number]; default 4,0.25; minimum score of intron in initiation of the EP algorithm
or
--EP [filename]; file with intron coordinates from protein splice alignment in GFF format
To run GeneMark.hmm predictions using previously derived model
--predict_with [filename]; file with species specific gene prediction parameters
To run ES, ET or EP with branch point model. This option is most useful for fungal genomes
--fungus
To run hmm, ES, ET or EP in PLUS mode (prediction with hints)
--evidence [filename]; file with hints in GFF format
Masking option
--soft_mask [number] or [auto]; default auto; to indicate that lowercase letters stand for repeats;
masks only lowercase repeats longer than specified length
In 'auto' mode length is adjusted based on the size of the input genome
Run options
--cores [number]; default 1; to run program with multiple threads
--pbs to run on cluster with PBS support
--v verbose
Optional sequence pre-processing parameters
--max_contig [number]; default 5000000; will split input genomic sequence into contigs shorter then max_contig
--min_contig [number]; default 50000 (10000 fungi); will ignore contigs shorter then min_contig in training
--max_gap [number]; default 5000; will split sequence at gaps longer than max_gap
Letters 'n' and 'N' are interpreted as standing within gaps
--max_mask [number]; default 5000; will split sequence at repeats longer then max_mask
Letters 'x' and 'X' are interpreted as results of hard masking of repeats
Optinal algorithm parameters
--max_intron [number]; default 10000 (3000 fungi); maximum length of intron
--max_intergenic [number]; default 50000 (10000 fungi); maximum length of intergenic regions
--min_contig_in_predict [number]; default 500; minimum allowed length of contig in prediction step
--min_gene_in_predict [number]; default 300 (120 fungi); minimum allowed gene length in prediction step
--gc_donor [value]; default 0.001; transition probability to GC donor in the range 0..1; 'auto' mode detects probability from training; 'off' switches GC donor model OFF
Developer options
--gc3 [number]; GC3 cutoff in training for grasses
--training to run only training step of algorithms; applicable to ES, ET or EP
--prediction to run only prediction step of algorithms using species parameters from previously executed training; applicable to ES, ET or EP
--usr_cfg [filename]; use custom configuration from this file
--ini_mod [filename]; use this file with parameters for algorithm initiation
--test_set [filename]; to evaluate prediction accuracy on the given test set
--key_bin
--debug
# -------------------
Fantabulous! Working very well!
Notice, for the key management I used this online ressource as indication.
Try to run an example with real dataset
gmes_petap.pl --cores 2 --sequence sesamiMR4003.contigs.fasta.masked --ES --fungus &> log &
I went here
wget https://github.com/guigolab/geneid/archive/v1.4.5.tar.gz
gunzip -d v1.4.5.tar.gz
tar -xvf v1.4.5.tar
cd geneid-1.4.5/
make
cd bin
./geneid -h
Got this
NAME
geneid - a program to annotate genomic sequences
SYNOPSIS
geneid [-bdaefitnxszru]
[-TDAZU]
[-p gene_prefix]
[-G] [-3] [-X] [-M] [-m]
[-WCF] [-o]
[-j lower_bound_coord]
[-k upper_bound_coord]
[-N numer_nt_mapped]
[-O <gff_exons_file>]
[-R <gff_annotation-file>]
[-S <gff_homology_file>]
[-P <parameter_file>]
[-E exonweight]
[-V evidence_exonweight]
[-Bv] [-h]
<locus_seq_in_fasta_format>
RELEASE
geneid v 1.4
OPTIONS
-b: Output Start codons
-d: Output Donor splice sites
-a: Output Acceptor splice sites
-e: Output Stop codons
-f: Output Initial exons
-i: Output Internal exons
-t: Output Terminal exons
-n: Output introns
-s: Output Single genes
-x: Output all predicted exons
-z: Output Open Reading Frames
-T: Output genomic sequence of exons in predicted genes
-D: Output genomic sequence of CDS in predicted genes
-A: Output amino acid sequence derived from predicted CDS
-p: Prefix this value to the names of predicted genes, peptides and CDS
-G: Use GFF format to print predictions
-3: Use GFF3 format to print predictions
-X: Use extended-format to print gene predictions
-M: Use XML format to print gene predictions
-m: Show DTD for XML-format output
-j <coord>: Begin prediction at this coordinate
-k <coord>: End prediction at this coordinate
-N <num_reads>: Millions of reads mapped to genome
-W: Only Forward sense prediction (Watson)
-C: Only Reverse sense prediction (Crick)
-U: Allow U12 introns (Requires appropriate U12 parameters to be set in the parameter file)
-r: Use recursive splicing
-F: Force the prediction of one gene structure
-o: Only running exon prediction (disable gene prediction)
-O <exons_filename>: Only running gene prediction (not exon prediction)
-Z: Activate Open Reading Frames searching
-R <exons_filename>: Provide annotations to improve predictions
-S <HSP_filename>: Using information from protein sequence alignments to improve predictions
-u: Turn on UTR prediction. Only valid with -S option: HSP/EST/short read ends are used to determine UTR ends
-E: Add this value to the exon weight parameter (see parameter file)
-V: Add this value to the score of evidence exons
-P <parameter_file>: Use other than default parameter file (human)
-B: Display memory required to execute geneid given a sequence
-v: Verbose. Display info messages
-h: Show this help
AUTHORS
geneid_v1.4 has been developed by Enrique Blanco, Tyler Alioto and Roderic Guigo.
Parameter files have been created by Genis Parra and Tyler Alioto. Any bug or suggestion
can be reported to geneid@crg.es
Fantabulous! Work well!Intruction for installation came from here
I went here
Then
wget https://ccb.jhu.edu/software/glimmerhmm/dl/GlimmerHMM-3.0.4.tar.gz
tar tar -xzf GlimmerHMM-3.0.4.tar.gz
check in the bin
./glimmerhmm_linux_x86_64
got this
USAGE: ./glimmerhmm_linux_x86_64 <genome1-file> <training-dir-for-genome1> [options]
Options:
-p file_name If protein domain searches are available, read them from file file_name
-d dir_name Training directory is specified by dir_name (introduced for compatibility with earlier versions)
-o file_name Print output in file_name; if n>1 for top best predictions, output is in file_name.1, file_name.2, ... , file_name.n f
-n n Print top n best predictions
-g Print output in gff format
-v Don't use svm splice site predictions
-f Don't make partial gene predictions
-h Display the options of the program
test ./glimmhmm.pl
./glimmhmm.pl
get this
Usage: glimmhmm.pl <glimmerhmm_program> <fasta_file> <train_dir> <options>
Work fine!
wget https://github.com/KorfLab/SNAP/archive/master.zip
unzip SNAP-master
make
./snap [options] <HMM file> <FASTA file> [options] # run SNAP
wget https://github.com/EVidenceModeler/EVidenceModeler/archive/v1.1.1.tar.gz
tar -xzvf v1.1.1.tar.gz
Just find awesome tutorials in genomics from Jianshu 1 | 2 | 3 | 4 | 5 | 6
awesome tutorials a
######## TUTO pasa 1 ######## TUTO pasa 2 ######## TUTO pasa 3
I think I need to set MySQL first the
awesome tuto go analysis | go analysis
Enable the MySQL 8.0 repository:
sudo yum localinstall https://dev.mysql.com/get/mysql80-community-release-el7-1.noarch.rpm
Install MySQL 8.0 package with yum:
sudo yum install mysql-community-server
Starting MySQL
sudo systemctl start mysqld
During the installation process, a temporary password is generated for the MySQL root user. Locate it in the mysqld.log with this command
sudo grep 'temporary password' /var/log/mysqld.log
I got this
2021-01-02T23:07:27.740244Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: WH-YNg7iSU9y
Configuring MySQL Run the security script:
sudo mysql_secure_installation
I got
[sudo] password for kplee:
I entered the root password
got this
Securing the MySQL server deployment.
Enter password for user root:
I entered the temporarely code
got this
The existing password for the user account root has expired. Please set a new password.
New password:
Note: Enter a new 12-character password that contains at least one uppercase letter, one lowercase letter, one number and one special character. Re-enter it when prompted.
Got this
The 'validate_password' component is installed on the server.
The subsequent steps will run with the existing configuration
of the component.
Using existing password for root.
Estimated strength of the password: 100
Change the password for root ? ((Press y|Y for Yes, any other key for No) : *No*
... skipping.
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.
Remove anonymous users? (Press y|Y for Yes, any other key for No) : *Yes*
Success.
Normally, root should only be allowed to connect from
'localhost'. This ensures that someone cannot guess at
the root password from the network.
Disallow root login remotely? (Press y|Y for Yes, any other key for No) : *Yes*
Success.
By default, MySQL comes with a database named 'test' that
anyone can access. This is also intended only for testing,
and should be removed before moving into a production
environment.
Remove test database and access to it? (Press y|Y for Yes, any other key for No) : *Yes*
- Dropping test database...
Success.
- Removing privileges on test database...
Success.
Reloading the privilege tables will ensure that all changes
made so far will take effect immediately.
Reload privilege tables now? (Press y|Y for Yes, any other key for No) : *Yes*
Success.
All done!
Connection to MySQL
To log in to the MySQL server as the root user type:
mysql -u root -p
got this
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 8.0.22 MySQL Community Server - GPL
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
I FOLLOWED THIS TUTO FOR THE INSTALLATION
Install mysql-connector
wget https://dev.mysql.com/get/mysql80-community-release-el7-2.noarch.rpm # Download latest mysql80-community-release-el7 rpm
sudo rpm -Uvh mysql80-community-release-el7-2.noarch.rpm # Install mysql80-community-release rpm
sudo yum install mysql-connector-c++-devel # Install mysql-connector
Now let us install openssl
Updating System Packages on CentOS
sudo yum update
check the version of OpenSSL installed
openssl version -a
got this
OpenSSL 1.0.2k-fips 26 Jan 2017
built on: reproducible build, date unspecified
platform: linux-x86_64
options: bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) idea(int) blowfish(idx)
compiler: gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -m64 -DL_ENDIAN -Wall -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wa,--noexecstack -DPURIFY -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DRC4_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
OPENSSLDIR: "/etc/pki/tls"
engines: rdrand dynamic
Step 1: Install dependencies
sudo yum install perl-core zlib-devel -y
Step 2: Download OpenSSL
cd /usr/local/src/
sudo wget https://www.openssl.org/source/openssl-1.1.1c.tar.gz
sudo tar -xf openssl-1.1.1c.tar.gz
cd openssl-1.1.1c
Step 3: Install OpenSSL
sudo ./config --prefix=/usr/local/ssl --openssldir=/usr/local/ssl shared zlib
got this
Operating system: x86_64-whatever-linux2
Configuring OpenSSL version 1.1.1c (0x1010103fL) for linux-x86_64
Using os-specific seed configuration
Creating configdata.pm
Creating Makefile
**********************************************************************
*** ***
*** OpenSSL has been successfully configured ***
*** ***
*** If you encounter a problem while building, please open an ***
*** issue on GitHub <https://github.com/openssl/openssl/issues> ***
*** and include the output from the following command: ***
*** ***
*** perl configdata.pm --dump ***
*** ***
*** (If you are new to OpenSSL, you might want to consult the ***
*** 'Troubleshooting' section in the INSTALL file first) ***
*** ***
**********************************************************************
then
sudo make
sudo make test
sudo make install
Step 4: Configure OpenSSL Shared Libraries
cd /etc/ld.so.conf.d/
sudo vi openssl-1.1.1c.conf
#Ensure to save before you exit.
#Next, reload the dynamic link by issuing the command below:
sudo ldconfig -v
Step 5: Configure OpenSSL Binary
First, backup the default OpenSSL binary files.
sudo mv /bin/openssl /bin/openssl.backup
Next, create new environment files for OpenSSL:
sudo nano /etc/profile.d/openssl.sh
Enter the following:
#Set OPENSSL_PATH
OPENSSL_PATH="/usr/local/ssl/bin"
export OPENSSL_PATH
PATH=$PATH:$OPENSSL_PATH
export PATH
Ensure to save before you exit.
Next, make the openssl.sh file executable by issuing the command below:
sudo chmod +x /etc/profile.d/openssl.sh
Next, reload the OpenSSL environment and check the PATH bin directory using commands below:
source /etc/profile.d/openssl.sh
echo $PATH
We can now check and verify our installation of the latest stable version of OpenSSL using the command below:
which openssl
openssl version -a
And voila! Inspirationnal tuto here
sudo yum install "perl(DBD::mysql)" # tuto [here](https://metacpan.org/pod/distribution/DBD-mysql/lib/DBD/mysql/INSTALL.pod)
Now check that it works:
perl -MDBD::mysql -e 'print "Hello there!";'
Install perl-DB_File
sudo yum install perl-DB_File # [source](https://metacpan.org/pod/distribution/DBD-mysql/lib/DBD/mysql/INSTALL.pod)
setting up mysql for pasa use
I will do later
Intsall Tom Wu's GMAP cdna alignment utility.
$ wget http://research-pub.gene.com/gmap/src/gmap-gsnap-2020-12-17.tar.gz
$ tar zxvf gmap-gsnap-2020-12-17.tar.gz
$ cd gmap-2020-12-17
$ ./configure --prefix=$pwd
$ make -j 8
$ sudo make install
Install Jim Kent's BLAT aligner
$ sudo yum install libpng-devel
$ wget http://hgwdev.cse.ucsc.edu/~kent/src/blatSrc35.zip
$ unzip blatSrc35.zip
$ cd blatSrc
$ MACHTYPE=x86_64-redhat-linux-gnu
$ export MACHTYPE
$ mkdir -p ~/bin/x86_64-redhat-linux-gnu
$ make
fail so I used this from this [post](https://www.biostars.org/p/291102/)
$ echo $MACHTYPE
x86_64-pc-linux-gnu
$ MACHTYPE=x86_64
$ export MACHTYPE
$ echo $MACHTYPE
x86_64
$ export PATH=~/bin/x86_64::$PATH
$ make
I refreshed the PATH variable with source ~/.bashrc and then I ran
blat
Install Bill Pearson's FASTA general sequence alignment utility
$ wget https://faculty.virginia.edu/wrpearson/fasta/fasta3/fasta-36.3.8h.tar.gz
$ tar zxvf fasta-36.3.8h.tar.gz
$ cd fasta-36.3.8h
$ cd src
$ make -f ../make/Makefile.linux_sse2 all
$ cd ..
$ ln -s $PWD/bin/fasta36 ~/bin/fasta
Install pasa
$ wget https://github.com/PASApipeline/PASApipeline/releases/download/pasa-v2.4.1/PASApipeline.v2.4.1.FULL.tar.gz
$ tar zxvf PASApipeline.v2.4.1.FULL.tar.gz
$ make
MySQL configuration
$ mysql -u root -p
mysql> create user 'pasa_write'@'localhost' identified by 'pasa_write_pwd';
mysql> CREATE USER 'kplee'@'localhost' IDENTIFIED BY 'kpleeSESAME2_270190'; #I used this
mysql> grant all privileges on %_pasa
.* to 'pasa_write'@'localhost';
mysql> GRANT ALL PRIVILEGES ON
%_pasa
.* TO 'kplee'@'localhost'; # I used this mysql> GRANT ALL PRIVILEGES ON . TO 'kplee'@'localhost'; # I change later
mysql> CREATE USER 'pasa'@'localhost' IDENTIFIED BY 'kpleeSESAME_270190';
I think I need to create a database first
mysql -u root -p
mysql> CREATE DATABASE my_db_pasa;
check available databases
SHOW DATABASES;
I got
+--------------------+
| Database |
+--------------------+
| information_schema |
| my_db_pasa |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
so I think I can set the alignment config file with mysql
cd /home/kplee/program/PASApipeline.v2.4.1/pasa_conf
cp pasa.CONFIG.template conf.txt
vi conf.txt
Do that
#####################################
#### MANDATORY SETTINGS #############
#####################################
# This file is not used if SQLITEDB is set in the alignment assembly configuration file
## MySQL settings:
# server actively running MySQL
# MYSQLSERVER=server.com
MYSQLSERVER=localhost
# Pass socket connections through Perl DBI syntax e.g. MYSQLSERVER=mysql_socket=/tmp/mysql.sock
# read-write username and password
MYSQL_RW_USER=myid
MYSQL_RW_PASSWORD=mypassword
#### END OF MANDATORY SETTINGS ######