A curated list of awesome Bioinformatics software, tools and resources.
一些高校、研究所也有整理软件工具列表,如:
- https://wiki.gacrc.uga.edu/wiki/Main_Page
- https://wiki.rc.ufl.edu/doc/Category:Software
- http://www.vcru.wisc.edu/simonlab/bioinformatics/programs/index.html
一些论坛也有类似的讨论帖,如 http://seqanswers.com/wiki/Software
我个人推荐一个网站,上面有很多的工具说明:https://omictools.com/
- FastQC(http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) 备注:FastQC用法:http://www.plob.org/2013/07/16/5987.html
- Fastx-toolkit(http://hannonlab.cshl.edu/fastx_toolkit/)
- PrinSeq(http://prinseq.sourceforge.net/)
FastUniq(https://sourceforge.net/projects/fastuniq/):将多个fastq合并为2个文件,同时去掉重复序列(duplicates)。(注意,fastuniq 不能读取 fastq gzip 压缩文件,需解压。)其他去除duplicates(不基于参考基因组比对)的工具有:fastx_collapser in the FASTX-Toolkit(single-end) and Fulcrum、CD-HIT-DUP、GPU-DupRemoval 去除duplicates,可参考文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123249/- QUASR:https://sourceforge.net/projects/quasr/:QUASR is a lightweight pipeline written to process and analyse next-generation sequencing (NGS) data from Illumina, 454, and Ion Torrent platforms.
- RSeQC:RSeQC包,它提供了一系列有用的小工具能够评估高通量测序尤其是RNA-seq数据.比如一些基本模块;检查序列质量,核酸组分偏性,PCR偏性,GC含量偏性,还有RNA-seq特异性模块:评估测序饱和度,映射读数分布,覆盖均匀性,链特异性,转录水平RNA完整性等。https://www.jianshu.com/p/edb9a5c3ecb0
-
Vectors,Adapters, linkers, and PCR primers检索:https://www.ncbi.nlm.nih.gov/tools/vecscreen/
-
Cutadapt: https://github.com/marcelm/cutadapt 或者 http://cutadapt.readthedocs.io/en/stable/index.html 切除adapter序列
-
Trimmomatic(http://www.usadellab.org/cms/?page=trimmomatic)
-
NGSQC toolkit(http://www.nipgr.res.in/ngsqctoolkit.html) 备注:NGSQC toolkit的用法:http://blog.csdn.net/shmilyringpull/article/details/9225195
-
SolexaQA(http://solexaqa.sourceforge.net/ 或者 https://sourceforge.net/projects/solexaqa/files/src/)
-
Trim Galore:http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ 或者 https://github.com/FelixKrueger/TrimGalore
-
Platanus_trim:http://platanus.bio.titech.ac.jp/?page_id=30 (不支持gzip格式的fastq文件)
-
Seqtk: https://github.com/lh3/seqtk
-
TagCleaner(https://sourceforge.net/projects/tagcleaner/files/):remove tag sequences (e.g. WTA or MID tags) from metagenomic datasets.
-
BioPieces: http://code.google.com/p/biopieces/
-
seq_crumbs(https://bioinf.comav.upv.es/seq_crumbs/)(python2程序,不推荐!)
-
seqcln(https://sourceforge.net/projects/seqclean/)(针对fasta format,不推荐!) 质控工具间的比较:https://zhuanlan.zhihu.com/p/28924793
二代测序---质量控制篇,参考:http://www.cnblogs.com/ZHshuang463508120/p/3606871.html
Reads error correction相关工具有:SOAPec、ErrorCorrection,这2个都是华大开发的,在 http://soap.genomics.org.cn/soapdenovo.html 均可下载.
-
SOAPec_v2.01.tar.gz, a correction tool for SOAPdenovo: http://sourceforge.net/projects/soapdenovo2/files/ErrorCorrection/SOAPec_v2.01.tar.gz/download
-
ErrorCorrection.tgz, another correction tool for SOAPdenovo: http://sourceforge.net/projects/soapdenovo2/files/ErrorCorrection/ErrorCorrection.tgz/download
-
Correction tool http://soap.genomics.org.cn/down/correction.tar.gz
-
SOAPdenovo http://soap.genomics.org.cn/down/SOAPdenovo-v1.04.tgz
更多Reads correction工具见:https://omictools.com/error-correction-category
Reads correction工具:Recommended programs: – HiSeq data: BLESS, Musket, RACER and SGA. – MiSeq data: RACER. – Human data: Musket, RACER and SGA." https://sourceforge.net/projects/musket/
其他类似工具:
-
ECHO (http://uc-echo.sourceforge.net/) 文献 http://genome.cshlp.org/content/21/7/1181.full
-
CORAL (https://www.cs.helsinki.fi/u/lmsalmel/coral/) 文献https://academic.oup.com/bioinformatics/article/27/11/1455/217071/Correcting-errors-in-short-reads-by-multiple
-
Quake如何安装:https://www.plob.org/article/1635.html
-
EC: an efficient error correction algorithm for short reads
-
QuorUM: An Error Corrector for Illumina Reads. For human data, the best tools are lighter and the latest bless. The old bless evaluated in the paper wasn't very good. 文献:https://academic.oup.com/bib/article/16/4/588/347932/Correcting-Illumina-data (Reads error correction一般在trim之后进行。)
-
Sprai(http://zombie.cb.k.u-tokyo.ac.jp/sprai/)Sprai (single-pass read accuracy improver) is a tool to correct sequencing errors in single-pass reads for de novo assembly. It is originally designed for correcting sequencing errors in single-molecule DNA sequencing reads, especially in Continuous Long Reads (CLRs) generated by PacBio RS sequencers.
K-mer估计:
- velvetK(http://www.vicbioinformatics.com/software.velvetk.shtml):用于计算最合适的Kmer
- KmerGenie(http://kmergenie.bx.psu.edu/):estimates the best k-mer length for genome de novo assembly.
De novo拼接:
-
VelvetOptimiser(http://www.vicbioinformatics.com/software.velvetoptimiser.shtml):批量多Kmer拼接
-
SPAdes(http://bioinf.spbau.ru/spades):Illumina、PacBio数据适用 (支持gzip压缩的fastq文件),同样适用于宏基因组。但实际情况,不太适用于病毒。
-
Shovill(https://github.com/tseemann/shovill):Faster SPAdes assembly of Illumina reads。
-
Soapdenovo(http://soap.genomics.org.cn/soapdenovo.html 或者 https://github.com/aquaskyline/SOAPdenovo2):华大开发的针对大基因组拼接
-
ABySS(http://www.bcgsc.ca/platform/bioinfo/software/abyss):基于De Bruijn Graph算法,适用于大基因组。
-
ALLPATHS-LG(http://software.broadinstitute.org/allpaths-lg/blog/):适合于组装short reads数据 ALLPATHS-LG的使用说明博客:http://blog.sciencenet.cn/blog-303373-717174.html
-
Celera Assembler(目前不再维护)(http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page),(https://sourceforge.net/projects/wgs-assembler/):Illumina、454、Pacbio等数据均适用。 -
CABOG(http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page):CABOG(Celera Assembler with Best Overlap Graph) is an extension of the Celera Assembler software。(不再维护)
-
Canu(http://canu.readthedocs.io/en/stable/#):PacBio RSII or Oxford Nanopore MinION数据适用 http://canu.readthedocs.io/en/latest/
-
Platanus(http://platanus.bio.titech.ac.jp/?p=1):专门为高杂合基因组组装设计的软件,同样适用于DNA Virus。
-
MetaPlatanus(http://platanus.bio.titech.ac.jp/?page_id=174):De novo assembly and sequence clustering of metagenomic data(宏基因组拼接)
-
RepARK(https://github.com/PhKoch/RepARK):de novo creation of repeat consensuses from whole-genome NGS reads
-
RepARK的文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4027187/
-
Novoalign(http://www.novocraft.com/products/novoalign/):mapping of short reads onto a reference genome
-
Falcon(https://github.com/PacificBiosciences/FALCON):基于String Graph算法,常用于PacBio diploid assembler。
-
Arachne & AllPath(https://www.broadinstitute.org/scientific-community/software)
-
VISTA tools,包括AVID: (http://pipeline.lbl.gov/run5details.shtml)
-
MIRA(https://sourceforge.net/p/mira-assembler/wiki/Home/):a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio。
-
gsAssembler/GS De Novo Assembler/runAssembly (command-line based) and gsMapper (command-line based)(http://www.454.com/products/analysis-software/):针对454数据的拼接
-
Newbler:是gsAssembler/GS De Novo Assembler的核心算法,已整合在GS De Novo Assembler
-
MetaVelvet(http://metavelvet.dna.bio.keio.ac.jp/):a short read assember for metagenomics
-
MaSuRCA(ftp://ftp.genome.umd.edu/pub/MaSuRCA/) 怎么使用MaSuRCA拼接:https://www.plob.org/article/7853.html
-
RAMPART(https://github.com/TGAC/RAMPART 或 http://www.earlham.ac.uk/rampart/):a pipeline for de novo assembly of DNA sequence data.
-
SHORTY(http://www3.cs.stonybrook.edu/~skiena/shorty/):SHORTY用于组装ABI SOLiD产生的序列。目前也可用于Illumina数据,但须先转为fasta格式。
-
iCORN2(http://icorn.sourceforge.net/):correct PacBio assemblies of Bacteria and Eukaryotes.
-
FaBox:http://users-birc.au.dk/biopv/php/fabox/:an online fasta sequence toolbox,可转换格式、提取序列
结合reference genome指导拼接:
- IDBA(http://i.cs.hku.hk/~alse/hkubrg/projects/idba_hybrid/index.html)
- Chromosomer(https://github.com/gtamazian/Chromosomer) Chromosomer文献:https://link.springer.com/article/10.1186/s13742-016-0141-6
- Scaffold_Builder(https://sourceforge.net/projects/scaffold-b/):Combining de novo and reference-guided assembly with Scaffold_builder 文献:http://scfbm.biomedcentral.com/articles/10.1186/1751-0473-8-23
- AlignGraph(https://github.com/baoe/AlignGraph)
- Ragout(https://github.com/fenderglass/Ragout)
- SyMap(http://www.agcol.arizona.edu/software/symap/):a turnkey synteny system with application to plant genomes,eukaryotic genomes 均适用。
- RACA()
- AMOScmp(https://sourceforge.net/projects/amos/?source=directory)
- Medusa(https://github.com/combogenomics/medusa)
- CONTIGuator(http://contiguator.sourceforge.net/)
- Multi-CAR(http://140.114.85.168/Multi-CAR/index.php)
- refGuidedDeNovoAssembly_pipelines:https://bitbucket.org/HeidiLischer/refguideddenovoassembly_pipelines 参考文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5681816/ # refGuidedDeNovoAssembly_pipelines 更适合大型基因组(真核),需要多个文库、mate文库(大片段文库)。
Ordering contigs against a reference:
-
Mauve(http://darlinglab.org/mauve/mauve.html) From the Tools menu, select ‘Move Contigs’.
-
ABACAS(http://abacas.sourceforge.net/index.html) 示例: perl abacas.1.3.1.pl -r ../../ref_data/NC_022082.fasta -q ../genomes/NJXKYY22.genome.fasta -p "nucmer" -i 70 -c -m -b -o test_sorted.fasta 更多使用说明:http://abacas.sourceforge.net/Manual.html
-
GAP5(http://www.sanger.ac.uk/science/tools/gap5 或者https://sourceforge.net/projects/staden/):Gap5 is a DNA sequence assembly visualiser and editing tool. GAP5使用说明:file:///C:/myProgram/Staden%20Package/share/doc/staden/manual/gap5_toc.html
病毒组装(virus assembly):
- VirAmp(http://docs.viramp.com/en/latest/index.html):a galaxy-based viral genome assembly pipeline https://github.com/kdaily/viramp-project http://viramp.readthedocs.io/en/latest/ VirAmp的文献:https://gigascience.biomedcentral.com/articles/10.1186/s13742-015-0060-y
- V-Fat(https://www.broadinstitute.org/viral-genomics/v-fat):V-FAT is a tool to perform automated computational finishing and annotation of de novo viral assemblies. automated finishing, annotation, and QA tool for viral assemblies.
- Viral-ngs(http://viral-ngs.readthedocs.io/en/latest/index.html):针对 rna 病毒
- IVA(https://github.com/sanger-pathogens/iva):IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.
- VIGA(https://github.com/EGTortuero/viga):VIGA a sensitive precise and automatic de novo viral genome annotator。
其他与病毒相关的工具: (1)Virus integration detection
-
BSVF(https://github.com/BioInfoTools/BSVF):Bisulfite Sequencing Virus integration Finder
-
VirusFinder (https://bioinfo.uth.edu/VirusFinder/)
-
VirusSeq(http://odin.mdacc.tmc.edu/%7Exsu1/VirusSeq.html):detecting known viruses and their integration sites in the human genome using next-generation sequencing data.
-
ViralFusionSeq (VFS)(https://sourceforge.net/projects/viralfusionseq/):discovering viral integration events and reconstruct fusion transcripts at single-base resolution.
-
Vy-PER (http://www.ikmb.uni-kiel.de/vy-per/ ):Virus integration detection bY Paired End Reads
-
seeksv(https://github.com/qiukunlong/seeksv):an accurate tool for structural variation and virus integration detection. (2)宏基因组数据相关的病毒
-
VirMet(https://github.com/ozagordi/VirMet):a set of tools for viral metagenomics
-
VirFinder(https://github.com/jessieren/VirFinder):R package for identifying viral sequences from metagenomic data using sequence signatures。
-
METAVIR:http://metavir-meb.univ-bpclermont.fr/ METAVIR is a web server designed to annotate viral metagenomic sequences (raw reads or assembled contigs).
-
haploclique(https://github.com/armintoepfer/haploclique):病毒snp、indel检测
-
Kronos(http://kronos.readthedocs.io/en/latest/):A workflow assembler for cancer genome analytics and informatics.
更多的组装工具见:http://www.mybiosoftware.com/assembly-tools
组装出来的基因组草图的scaffold需要进一步进行gaps的关闭。进行这样功能的软件有:
- SOAPdenovo GapCloser (http://sourceforge.net/projects/soapdenovo2/files/GapCloser/)
- IMAGE(https://sourceforge.net/projects/image2/):Iterative Mapping and Assembly for Gap Elimination。
- GapFiller (https://www.baseclear.com/services/bioinformatics/basetools/gapfiller/) GapFiller使用说明博客:https://www.plob.org/article/6182.html
- 另外一个 GapFiller(https://sourceforge.net/projects/gapfiller/)
- FinIS(https://sourceforge.net/projects/finis/)
- FGAP(https://sourceforge.net/projects/fgap/):利用BLAST将contigs序列比对到基因组草图序列上,寻找重叠到gap区间的最优序列,从而进行补洞。 FGAP的文献:https://www.researchgate.net/publication/263207973_FGAP_An_automated_gap_closing_tool 或者 http://bmcresnotes.biomedcentral.com/articles/10.1186/1756-0500-7-371 FGAP的使用博客:http://www.chenlianfu.com/?p=2333
- icorn(http://icorn.sourceforge.net/):that enables errors in the consensus sequence to be corrected by iteratively mapping reads to the current assembly. (校正序列)
Bandage:https://rrwick.github.io/Bandage/ Assembly Graph Visualisation
微生物基因组流程相关软件:https://holtlab.net/2015/02/25/tools-for-bacterial-comparative-genomics/
对基因组错误评估
-
REAPR(Recognition of Errors in Assemblies using Paired Reads)能利用成对的reads来识别基因组序列中的错误。从而,能将基因组序列从错误的gap处断开或将错误序列使用 Ns 代替。同时,对错误信息进行统计。 REAPR官网:http://www.sanger.ac.uk/science/tools/reapr 安装 REAPR 需要先安装 R 和 Perl 模块: File::Basename, File::Copy, File::Spec, File::Spec::Link, Getopt::Long, List::Util。 REAPR使用的博客:http://www.chenlianfu.com/?p=2329
-
QUAST(http://bioinf.spbau.ru/quast 或者 http://quast.sourceforge.net/quast):基因组装配质量评估工具 QUAST说明文档:http://quast.bioinf.spbau.ru/manual.html
-
Miller Lab:http://www.bx.psu.edu/miller_lab/
-
Mauve assembly metrics - (http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve)
-
InGAP-SV - (http://ingap.sourceforge.net/):InGAP is also useful for finding structural variants between genomes from read mappings.
merge-gbk-records:https://github.com/kblin/merge-gbk-records:Merge multiple GenBank records using a defined spacer sequence
组装流程参考文档:http://vlsci.github.io/lscc_docs/tutorials/assembly/assembly-protocol/#section-2-assembly http://onlinelibrary.wiley.com/doi/10.1111/eva.12178/full https://en.wikipedia.org/wiki/Sequence_assembly
- iAssembler(http://bioinfo.bti.cornell.edu/tool/iAssembler/):利用MIRA以及CAP3软件,将454以及sanger测序产生的转录组数据(EST)拼接成contigs。 相关文献:Yi Zheng , Liangjun Zhao , Junping Gao and Zhangjun Fei(2011)iAssembler: a package for de novo assembly of Roche-454/Sanger transcriptome sequences.
-
BLAST+(ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)
-
clustalx/clustalw(http://www.clustal.org/) clustalX是clustaw的图形化版本,前者在windows环境下使用,后者在DOS环境下是使用。 clustalw-format:http://web.mit.edu/meme_v4.9.0/doc/clustalw-format.html
-
MAFFT(Multiple Alignment using Fast Fourier Transform)(http://mafft.cbrc.jp/alignment/software/)
-
MUSCLE(MUltiple Sequence Comparison by Log- Expectation)(http://www.drive5.com/muscle/)
-
T-Coffee(http://www.tcoffee.org/Projects/tcoffee/index.html)
-
LAGAN & Shuffle-LAGAN(http://lagan.stanford.edu/lagan_web/index.shtml)
-
amos(http://sourceforge.net/projects/amos/files/):minimus2是amos拼接软件包里面的一个组件,它的功能就是将两组contig进行合并,延伸contig的长度,减少contig的数量。Amos是A Modular, Open-Source whole genome assembler的缩写,致力于打造成一个拼接软件的基础软件系统。minimus2用的是基于nucmer overlap检测的算法,速度上比Smith-Waterman hash-overlap的算法要快。更多说明:http://amos.sourceforge.net/wiki/index.php/AMOS
-
circlator(http://sanger-pathogens.github.io/circlator/):A tool to circularize genome assemblies
-
ACT(Artemis Comparison Tool)(http://www.sanger.ac.uk/science/tools/artemis-comparison-tool-act)
-
GMAP(http://research-pub.gene.com/gmap/ 或者 https://wiki.gacrc.uga.edu/wiki/Gmap-gsnap-Sapelo):A Genomic Mapping and Alignment Program for mRNA and EST Sequences
-
MSA(https://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html) msa(http://www.bioconductor.org/packages/release/bioc/html/msa.html):an R package for multiple sequence alignment。
-
MSAProbs(https://sourceforge.net/projects/msaprobs/ 或者 http://msaprobs.sourceforge.net/homepage.htm#latest)
-
MergeAlign(http://www.stevekellylab.com/software/mergealign)
Muscle,ClustalW和T-coffee的简单比较:https://www.plob.org/article/4104.html 更多比对软件:https://en.wikipedia.org/wiki/List_of_sequence_alignment_software http://www.ebi.ac.uk/Tools/msa/
多序列比对的格式:http://www.cnblogs.com/tsingke/p/3940074.html 多序列比对 wiki百科:https://en.wikipedia.org/wiki/Multiple_sequence_alignment http://www.docin.com/p-812012331.html
全局比对工具 GASSST:http://www.irisa.fr/symbiose/projects/gassst/ 示例: Gassst -d tmp.fna -i gene_primer_out/Microcystis_aeruginosa.eryG_2.Microcystis_aeruginosa.eryG_2.p3_seqs.fa -o test.gassout -p 80 -m 8 -n 10
蛋白多序列比对转为核酸比对: pal2nal:http://www.bork.embl.de/pal2nal/
- Bowtie(http://bowtie-bio.sourceforge.net/index.shtml)
- Bwa(http://bio-bwa.sourceforge.net)
- MAQ(http://maq.sourceforge.net/)
- subread(http://subread.sourceforge.net/)
- BBMap(https://sourceforge.net/projects/bbmap/):BBMap short read aligner, and other bioinformatic tools.
- BBtools(http://jgi.doe.gov/data-and-tools/bbtools/) BBmap的使用:http://seqanswers.com/forums/showthread.php?t=58221 和 http://seqanswers.com/forums/showthread.php?t=44494
- Stampy(http://www.well.ox.ac.uk/project-stampy):快速、灵敏
- Stampy的文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3106326/
- samblaster:(https://github.com/GregoryFaust/samblaster)a tool to mark duplicates and extract discordant and split reads from sam files.
- sambamba:(https://github.com/biod/sambamba 或者 http://lomereiter.github.io/sambamba/) Tools for working with SAM/BAM data. (推荐!)
- ELAND
- Novoalign
- SMALT(http://www.sanger.ac.uk/science/tools/smalt-0 或者 https://sourceforge.net/projects/smalt/) :SMALT aligns DNA sequencing reads with genomic reference sequences.
- BEDTools(https://code.google.com/p/bedtools/)
-
Dindel(http://sites.google.com/site/keesalbers/soft/dindel):小的插入/缺失发现
-
Pindel(http://gmt.genome.wustl.edu/packages/pindel/):小的插入/缺失发现
-
Samtools(http://samtools.sourceforge.net 或者 http://www.htslib.org/):mapping后数据分析的工具
-
bcftools(http://www.htslib.org/download/)
-
VarScan(http://massgenomics.org/varscan 或者 http://dkoboldt.github.io/varscan/)
-
scalpel(https://sourceforge.net/projects/scalpel/?source=directory):Genetic variants discovery and detect indel scalpel的文献:http://www.nature.com/nmeth/journal/v11/n10/full/nmeth.3069.html 使用方法参考:http://www.bio-info-trainee.com/2341.html
-
ScanIndel(https://github.com/cauyrd/ScanIndel)
-
Snippy(https://github.com/tseemann/snippy):bacterial SNP and indel calling
-
Picard(http://broadinstitute.github.io/picard/ 或者https://github.com/broadinstitute/picard):java程序
-
SpeedSeq:(https://github.com/hall-lab/speedseq)由华盛顿大学医学院等机构的研究人员开发。它利用低成本的服务器,在短短的13小时内即可完成50x人类基因组的比对、变异检测和功能注释。这解决了目前WGS生物信息学的瓶颈。可应用于WGS、WES、panel测序数据。 SpeedSeq文献:http://www.nature.com/nmeth/journal/v12/n10/full/nmeth.3505.html 可参考:http://www.biotrainee.com/thread-338-1-1.html
-
Sequence Variant Analyzer(http://www.svaproject.org):在基因组背景下显示变异
-
HugeSeq(https://github.com/StanfordBioinformatics/HugeSeq):结构变异的pipeline 参考:http://blog.csdn.net/alex6plus7/article/details/50236375
-
KvarQ(https://github.com/kvarq/kvarq):Targeted and direct variant calling in FastQ reads of bacterial genomes。
-
nesoni:https://github.com/Victorian-Bioinformatics-Consortium/nesoni a toolkit for NGS SNP calling / RNA-Seq DGE / read cleaning。
-
RedDog:https://github.com/katholt/RedDog a workflow pipeline for short read length sequencing analysis, including the read mapping task, through to variant detection, followed by analyses (SNPs only). Single nucleotide polymorphisms (SNPs) with Phred quality score ≥30 were identified in each isolate using SAMTools.
- LUMPY(https://github.com/arq5x/lumpy-sv):a general probabilistic framework for structural variant discovery.
- MetaSV:(http://bioinform.github.io/metasv/)An accurate and integrative structural-variant caller.
- MetaSV文献:https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btv204
- FindSV:(https://github.com/dnil/FindSV)
- SomaticSniper(http://gmt.genome.wustl.edu/packages/somatic-sniper/ 或者 https://github.com/genome/somatic-sniper):检测SNV
FindTranslocations,CNVnator and fermikit
SV、CNV
- SV-Autopilot(https://github.com/ALLBio/allbiotc2)
- GASV:http://compbio.cs.brown.edu/projects/gasv/ 或者https://github.com/ZhihaoXie/GASV_ GASV文档:https://vcru.wisc.edu/simonlab/bioinformatics/programs/gasv/GASV_UserGuide.pdf
- srGASV:https://github.com/dstorch/srGASV
- MultiBreak-SV:http://compbio.cs.brown.edu/projects/multibreaksv/ 或者 https://github.com/raphael-group/multibreak-sv
- SVDetect:https://sourceforge.net/projects/svdetect/
- PEMer:detecting SVs from paired-end reads. http://sv.gersteinlab.org/pemer/ 或者 https://github.com/BIGLabHYU/PEMer
- VariationHunter: An tool for identifying structural variations from paired-end WGS data. https://sourceforge.net/projects/variationhunter/
- vaquita:https://github.com/seqan/vaquita Identification of structural variations # 注意,vaquita需要的ref序列必须以 .fa 为后缀。
- svmerge:https://sourceforge.net/projects/svmerge/ A tool for SVs analysis by integrating calls from several existing SV callers.
- breakway:https://sourceforge.net/projects/breakway/ identification of genomic breakpoints
- CNT-MD:Copy-Number Tree Mixture Deconvolution http://compbio.cs.brown.edu/projects/cnt-md/ 或者 https://github.com/raphael-group/CNT-MD
- CNT-ILP: Copy-Number Tree http://compbio.cs.brown.edu/projects/cnt-ilp/ 或者https://github.com/raphael-group/CNT-ILP
- Whole Exome Sequencing Analysis Pipeline: http://metamoodics.org/wiki/index.php?title=Whole_Exome_Sequencing_Analysis_Pipeline
- BSseeker2(https://github.com/BSSeeker/BSseeker2):A versatile aligning pipeline for bisulfite sequencing data.
更多工具见:http://www.knowgene.com/question/8855
相关工具:https://omictools.com/indel-detection-category
-
PopSV:https://github.com/jmonlong/PopSV Human copy number variants detection
-
Sniffles:https://github.com/fritzsedlazeck/Sniffles Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore).
-
NGMLR:https://github.com/philres/ngmlr NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations.
遗传变异软件综述:https://academic.oup.com/bib/article/15/2/256/210976/A-survey-of-tools-for-variant-analysis-of-next 一些软件工具列表:http://seqanswers.com/forums/showthread.php?t=43
- Findpeaks(http://vancouvershortr.sourceforge.net)
- Cufflinks(http://cufflinks.cbcb.umd.edu):测定转录本丰度
- Tophat(http://ccb.jhu.edu/software/tophat/index.shtml):剪接点定位
- Trinity (https://github.com/trinityrnaseq/trinityrnaseq/wiki)
- Oases(http://www.ebi.ac.uk/~zerbino/oases/):根据转录组数据拼接
- Trans-ABySS(http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss):转录组拼接
- HISAT(http://ccb.jhu.edu/software/hisat/index.shtml):转录组差异表达分析
- StringTie(http://ccb.jhu.edu/software/stringtie/):组装转录本并预计表达水平
- Ballgown(https://github.com/alyssafrazee/ballgown):RNA-seq的差异表达分析 拓展阅读:利用tophat和Cufflinks做转录组差异表达分析的步骤详解 更多rna方面的软件:http://www.mybiosoftware.com/rna-analysis
- Integrated Genome Browser(http://www.bioviz.org/igb/)
- Integrative Genomics Viewer(http://www.broadinstitute.org/software/igv/)
- Artemis(http://www.sanger.ac.uk/science/tools/artemis)
- CLC BioWorkbench(https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/)
- Geneious(http://www.geneious.com/)http://www.geneious.com/features/assembly-mapping
- IGV (www.broadinstitute.org/igv/)
- hemi(http://hemi.biocuckoo.org/index.php):图形化绘制heatmap
- clusterProfiler: https://github.com/GuangchuangYu/clusterProfiler:statistical analysis and visualization of functional profiles for genes and gene clusters
- circos(http://circos.ca)
- BioCircos:http://bioinfo.ibp.ac.cn/biocircos/index.php
- BRIG(http://brig.sourceforge.net/) 文档:http://brig.sourceforge.net/brig-tutorial-1-whole-genome-comparisons/ https://sourceforge.net/projects/brig/files/
- OGDRAW(http://ogdraw.mpimp-golm.mpg.de/index.shtml):细胞器基因组圈图的绘制
- DNAPlotter(http://www.sanger.ac.uk/science/tools/dnaplotter)
-
Glimmer(http://ccb.jhu.edu/software/glimmer/index.shtml):针对细菌、古菌、病毒的基因预测
-
GeneMarkS(http://topaz.gatech.edu/GeneMark/):细菌、古菌、病毒、噬菌体、病毒和转录组的基因预测
-
MetaGeneMark:Genemark的一个针对metagenome的预测软件
-
Prodigal(http://prodigal.ornl.gov/):针对原核生物的基因预测(高GC可用),metaGenome也适用,但不适用与RNA gene and viral gene预测。
-
MetaGene Annotator(MetaGeneAnnotator)(http://metagene.cb.k.u-tokyo.ac.jp/):a gene-finding program for prokaryote and phage. metaGenome也适用。
-
FragGeneScan(https://github.com/COL-IU/FragGeneScan.git):It can be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.
-
Orphelia(http://orphelia.gobics.de/):Orphelia is a metagenomic ORF finding tool for the prediction of protein coding genes in short, environmental DNA sequences with unknown phylogenetic origin。
-
GenScan(http://genes.mit.edu/GENSCAN.html):脊椎动物、拟南芥和玉米的基因预测工具
-
Pfam_Scan(http://pfam.xfam.org/):蛋白结构域的预测 PfamScan工具(ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools/)
-
tRNAscan-SE(http://lowelab.ucsc.edu/tRNAscan-SE/):tRNA预测
-
ARAGORN:http://130.235.46.10/ARAGORN/ 或者 http://mbio-serv2.mbioekol.lu.se/ARAGORN/Downloads/ ARAGORN detects tRNA, mtRNA, and tmRNA genes.
-
Barrnap(http://www.vicbioinformatics.com/software.barrnap.shtml 或者 https://github.com/tseemann/barrnap):rRNA预测识别
-
snoGPS(http://lowelab.ucsc.edu/snoGPS/):Search for H/ACA snoRNA genes in a genomic sequence
-
Snoscan(http://lowelab.ucsc.edu/snoscan/):Search for C/D box methylation guide snoRNA genes in a genomic sequence
-
OrfM:https://github.com/wwood/OrfM simple and not slow ORF caller。
-
getorf:http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html Find and extract open reading frames (ORFs).
-
checktrans:http://emboss.open-bio.org/rel/rel6/apps/checktrans.html Reports STOP codons and ORF statistics of a protein.
-
plotorf:http://emboss.sourceforge.net/apps/release/6.0/emboss/apps/plotorf.html Plot potential open reading frames in a nucleotide sequence.
-
ORFfinder:ftp://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/ORFfinder/linux-i64/ ORF Finder(online工具):http://www.bioinformatics.org/sms2/orf_find.html
-
AntiFam:ftp://ftp.ebi.ac.uk/pub/databases/Pfam/AntiFam/ 识别假的ORF AntiFam的文章:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308159/ http://xfam.org/ 如何执行AntiFam? hmmsearch --domtblout test_vs_antifam.out --tblout test_vs_antifam.out2 --domE 1e-10 --cpu 12 ../AntiFam.hmm test.faa
-
Manatee(http://manatee.sourceforge.net/igs/index.shtml):Manatee is a web-based tool used to perform manual functional annotation.
-
Ergatis(http://ergatis.sourceforge.net/index.html)、(https://sourceforge.net/projects/ergatis/)
-
RAST(http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer 或者 http://rast.nmpdr.org/):annotating bacterial and archaeal genomes(在线)
-
prokka(http://www.vicbioinformatics.com/software.prokka.shtml):针对原核的注释
-
Annotationtools:https://github.com/rbotts/Annotationtools Python script for annotating sequences from fasta file (Bacterial). Uses GeneMarkS and BioPython. (针对原核生物)
-
RATT(Rapid Annotation Transfer Tool)http://ratt.sourceforge.net/:基于参考基因组进行快速基因功能注释。RATT is not now part of PAGIT.
-
PAGIT(http://www.sanger.ac.uk/science/tools/pagit)(Post Assembly Genome Improvement Toolkit).
-
assembly-stats(https://github.com/sanger-pathogens/assembly-stats)
-
assembly-stats(https://github.com/rjchallis/assembly-stats)
-
assemblyStatics(https://github.com/WenchaoLin/assemblyStatics)
-
velvet-stats(https://github.com/ajmazurie/velvet-stats)
-
seqStats(https://github.com/peteashton/seqStats):Two figures are produced: one contains the length distribution histogram and a cumulative length plot, the other plots GC vs sequence length.
-
TBtools(https://github.com/CJ-Chen/TBtools)
- GCE(ftp://ftp.genomics.org.cn/pub/gce/):是华大基因用于基因组评估的软件
- GCE的文献:https://www.researchgate.net/publication/255722390_Estimation_of_genomic_characteristics_by_analyzing_k-mer_frequency_in_de_novo_genome_projects 使用说明博客:https://www.plob.org/article/9388.html
- KmerGenie(http://kmergenie.bx.psu.edu/)
- Jellyfish (http://www.genome.umd.edu/jellyfish.html) Jellyfish的用法说明:http://www.chenlianfu.com/?p=806
- KmerFreq
- CNV检测的软件:CoNIFER(http://conifer.sourceforge.net/)
- SNP注释软件:annovar(http://annovar.openbioinformatics.org/en/latest/)
- blast2go(https://www.blast2go.com/)
- GO_Annotation_Plot (https://github.com/ZhihaoXie/GO_Annotation_Plot.git)
- Sibelia: A comparative genomics tool(http://bioinf.spbau.ru/en/sibelia)
-
TreeBest(https://github.com/lh3/treebest 或者 http://treesoft.sourceforge.net/)
-
TreeBest的使用:http://blog.sina.com.cn/s/blog_620b35790100mcp6.html
-
RAxML(https://sco.h-its.org/exelixis/web/software/raxml/index.html):ML树工具
-
PhyML(http://www.atgc-montpellier.fr/phyml/):在线构建ML树的工具,也可以本地执行
-
profileNJ(https://github.com/maclandrol/profileNJ):使用物种数和NJ树校正Gene tree
-
Figtree(http://tree.bio.ed.ac.uk/software/figtree/):a graphical viewer of phylogenetic trees and as a program for producing publication-ready figures.
-
Dendroscope(http://dendroscope.org/):Software for visualizing phylogenetic trees and rooted networks.
-
PATRIC(https://www.patricbrc.org/):Phylogenetic Tree Builder
-
TempEst(http://tree.bio.ed.ac.uk/software/tempest/)TempEst is a tool for investigating the temporal signal and 'clocklikeness' of molecular phylogenies.
-
liftover(http://hgdownload.cse.ucsc.edu/admin/exe/):用于基因组版本坐标转换(http://genome.ucsc.edu/) 参考:http://www.plob.org/article/9541.html
-
splign是NCBI中一个比对cDNA和genome的一个工具,通过splign可以很方便的找到cDNA各个外显子。 参考:http://www.plob.org/article/7361.html
(1)宏基因组拼接工具
可用的拼接的工具:SOAPdenovo、SPAdes、IDBA、MetaPlatanus、ABySS、CABOG
- TruSPAdes(http://cab.spbu.ru/software/spades/):用于宏基因组的拼接
- MEGAHIT(https://github.com/voutcn/megahit)
- Ray(https://github.com/sebhtml/ray 或者 http://denovoassembler.sourceforge.net/):a de novo assembler using MPI 2.2. Ray Meta: scalable de novo metagenome assembly and profiling.
- Meraga()
- Minia (http://minia.genouest.org/)
- MetaVelvet(http://metavelvet.dna.bio.keio.ac.jp/):a short read assember for metagenomics 可参考:http://blog.sina.com.cn/s/blog_670445240101lg2a.html
- MetAMOS(https://github.com/marbl/metAMOS):A metagenomic and isolate assembly and analysis pipeline built with AMOS。
- Subtractive Assembly(https://sourceforge.net/projects/subtractive-assembly/):通过拼接来比较宏基因组间的差异。主要目的是降低宏基因组的拼接成本,着眼于发现差异物种和差异基因,先基于原始的reads挑选具有差异kmer的reads,然后将挑选出来的reads进行拼接。 可参考:http://blog.sina.com.cn/s/blog_83f77c940102vvwr.html
(2)其他
- MG-RAST(http://metagenomics.anl.gov/) http://evomics.org/learning/genomics/metagenomics/mg-rast/
- GOTTCHA(https://github.com/LANL-Bioinformatics/GOTTCHA)
- MIDAS(https://github.com/snayfach/MIDAS):Metagenomic Intra-Species Diversity Analysis System。Our reference database of bacterial species and associated genomic data resources are available at http://lighthouse.ucsf.edu/MIDAS。
- MIDAS的文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5088602/
- checkM(https://github.com/Ecogenomics/CheckM)
(3)taxonomic 物种分类
- Kraken(http://ccb.jhu.edu/software/kraken/)
- Kaiju(http://kaiju.binf.ku.dk/ 或者 https://github.com/bioinformatics-centre/kaiju):Kaiju is a program for sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing or metatranscriptomics experiments.
- sourmash (pip install -U https://github.com/dib-lab/sourmash/archive/master.zip)
- MetaPhlAn2(http://segatalab.cibio.unitn.it/tools/metaphlan2/ 或者 https://bitbucket.org/biobakery/metaphlan2/src/default/)
- mOTU(http://www.bork.embl.de/software/mOTU/)
- PanPhlAn(http://segatalab.cibio.unitn.it/tools/panphlan/)
- ConStrains(https://bitbucket.org/luo-chengwei/constrains):reads 数据作为输入 文献:http://www.nature.com/nbt/journal/v33/n10/full/nbt.3319.html
- Krona(https://github.com/marbl/Krona/wiki):Taxonomy展示
(4)binning
- metaBAT:https://bitbucket.org/berkeleylab/metabat
- ESOM:http://databionic-esom.sourceforge.net/
- ESOM:https://sourceforge.net/projects/databionic-esom/?source=directory
- CheckM:http://ecogenomics.github.io/CheckM/ 或者 https://github.com/Ecogenomics/CheckM/releases
- MetaCluster:http://i.cs.hku.hk/~alse/MetaCluster/
- MetaBin:http://metabin.riken.jp/
(5)其他一些工具
- tetramerFreqs/Binning:https://github.com/tetramerFreqs/Binning
- Hawth's Analysis Tools for ArcGIS:http://www.spatialecology.com/htools/overview.php
其他: http://www.360doc.com/content/16/0815/17/35684706_583419969.shtml
微生物生态研究中常用数据库简介:http://www.cnblogs.com/nkwy2012/p/6396435.html
参考: http://msb.embopress.org/content/9/1/666 (一篇综述) http://www.ebiotrade.com/newsf/2014-8/2014814163301250.htm
TaxonKit:https://bioinf.shenwei.me/taxonkit/ Efficient NCBI Taxonomy Toolkit
-
UNBIAS
-
Vseach
-
usearch
-
NINJA
-
SRA Toolkit:https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software http://ncbi.github.io/sra-tools/ https://github.com/ncbi/sra-tools 如何用fastq-dump把sra格式转成fastq格式(fq格式):http://www.cnblogs.com/emanlee/archive/2013/04/15/3022328.html
- GFam(http://www.paccanarolab.org/gfam):GFam is a command-line tool for automatic annotation of gene families.
- SQANTI(https://bitbucket.org/ConesaLab/sqanti):全长转录组测序新转录结构发现注释工具 http://www.ngsgo.com/biology/1436.html
- eggNOG-mapper(http://eggnogdb.embl.de/#/app/emapper)
参考:http://diyitui.com/content-1466484195.47288872.html
- ASpipe(https://sourceforge.net/projects/aspipe/):ASpipe is a pipeline to process GeneSeqer/GMAP alignments and identify alternative splicing (AS) events from the alignments. It requires unix bash, perl 5.0+ with DBI module and MySQL5.0+ to run properly.
-
UCSC Genome Browser http://genome.ucsc.edu
-
Ensembl Genome Browser http://www.ensembl.org
-
NCBI Genome Browser http://www.ncbi.nlm.nih.gov/mapview
-
GMOD GBrowser http://gmod.org
-
UTGB http://utgenome.org/
-
IGV (Broad) http://www.broadinstitute.org/igv/
-
JBrowser (javascript) http://jbrowse.org/
-
Argo Genome Browser (Broad) http://www.broadinstitute.org/annotation/argo/
-
Gaggle Genome Browser http://gaggle.systemsbiology.net/docs/geese/genomebrowser/
-
Celera Genome Browser http://sourceforge.net/projects/celeragb/files/
-
Apollo Genome Annotation Curation Tool http://apollo.berkeleybop.org/current/index.html
参考:http://www.dxy.cn/bbs/thread/1385361#1385361 Map viewer的使用指南:http://www.dxy.cn/bbs/thread/1385361#1385361
NCBI使用 build 36这样的版本号;而ucsc等使用诸如human genome的hg18,hg19这样的版本号;ensembl呢,有自己的release版本,但是数据采用NCBI的编号。 两种风格的版本号有对应关系,比如human genome: hg19 = GRCh37,或者Build 38 patch release 7对应 GRCh38.p7。
其他工具:
-
Roary(http://sanger-pathogens.github.io/Roary/):rapid large-scale prokaryote pan genome analysis Roary文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4817141/
-
BPGA(http://www.iicb.res.in/bpga/index.html 或者 https://sourceforge.net/projects/bpgatool/) BPGA is an ultra-fast software package that provides comprehensive pan genome analysis of microorganisms.(仅针对原核) BPGA文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4829868/pdf/srep24373.pdf
-
PanGP (https://pangp.ybzhao.com/)PanGP is a tool for quickly analyzing bacterial pan-genome profile.(泛基因组特征分析、特征曲线)
-
panOCT(https://sourceforge.net/projects/panoct/?source=directory)
-
panOCT文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3526259/
-
BSR(http://bsr.igs.umaryland.edu/) LS-BSR文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3976120/
-
PGAP:pan-genomes analysis pipeline. (原核生物泛基因组学分析的自动化软件) https://github.com/kastman/pgap-docker https://sourceforge.net/projects/pgap/ PGAP文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268234/ # PGAP 太耗时了!!慎用!
-
metaPGAP(https://github.com/mitul-patel/metaPGAP):metagenomic Pan Genome Analysis Pipeline
-
AGAPE(https://github.com/yeastgenome/AGAPE):针对酵母的pan-genome analysis
-
Parsnp(http://harvest.readthedocs.io/en/latest/content/parsnp.html 或者 https://github.com/marbl/parsnp) Rapid core genome multi-alignment.(bacterial genomes ) Parsnp的文章:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262987/
-
PGAP-X:https://pgapx.ybzhao.com/ PGAP-X is a microbial comparative genomic analysis platform with graphic interface.(比较基因组分析图形化接口)
- LTR_retriever(https://github.com/oushujun/LTR_retriever):识别LTR retrotransposons
工具:
-
abricate(https://github.com/tseemann/abricate):Mass screening of contigs for antimicrobial and virulence genes
-
ARIBA:https://github.com/sanger-pathogens/ariba 抗性基因检测(fastq序列作为输入)
-
SRST2:https://github.com/katholt/srst2 或者 http://katholt.github.io/srst2/
-
ARGs-OAP:https://github.com/biofuture/Ublastx_stageone 和 http://smile.hku.hk/SARGs ARGs-OAP的文献:https://academic.oup.com/bioinformatics/article/32/15/2346/1743463 # 注意,ARGs-OAP的输入文件为fastq
-
Meta-MARC:https://github.com/lakinsm/meta-marc 宏基因的耐药性基因检测
-
DeepARG:http://bench.cs.vt.edu/deeparg 一种从宏基因组学数据中预测抗生素耐药性基因的深度学习方法。 DeepARG文献:https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0401-z
Antimicrobial Resistance Gene Database:
- ARDB: http://ardb.cbcb.umd.edu/index.html
- BacMet (http://bacmet.biomedicine.gu.se/): Antibacterial biocide and metal resistance genes database # BacMet 有配套检索注释工具,其执行如: perl /sdg/database/BacMet_v1.1/BacMet-Scan_v1.1.pl -i ./final.scaffold.fa -o E6.3 -d /sdg/database/BacMet_v1.1/BacMet_EXP.704 -blast -e 0.00001 -cpu 10 -columns all -p 20 -table -report -counts -v
- CARD:https://card.mcmaster.ca/
- Resfams:http://www.dantaslab.org/resfams
- NCBI Bacterial Antimicrobial Resistance Reference Gene Database:https://www.ncbi.nlm.nih.gov/bioproject/PRJNA313047
- ARG-ANNOT:http://en.mediterranee-infection.com/article.php?laref=283%26titre=arg-annot
- ResFinder:https://cge.cbs.dtu.dk/services/ResFinder/ ResFinder identifies acquired antimicrobial resistance genes and/or find chromosomal mutations in total or partial sequenced isolates of bacteria. ResFinder:https://bitbucket.org/genomicepidemiology/resfinder
- EcOH:https://github.com/katholt/srst2/tree/master/data
- PlasmidFinder:https://cge.cbs.dtu.dk/services/PlasmidFinder/ PlasmidFinder identifies plasmids in total or partial sequenced isolates of bacteria. PlasmidFinder, which searches for matches in a replicon database, had the highest precision (1.0) but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall = 0.33). PlasmidFinder数据库下载链接:https://cge.cbs.dtu.dk//services/data.php
- cBAR(http://csbl.bmb.uga.edu/~ffzhou/cBar/) recall and precision of 0.77 and 0.63.
- Recycler(https://github.com/Shamir-Lab/Recycler) It correctly predicted small plasmids but failed with long plasmids (recall = 0.12, precision = 0.28).
- PlasmidSPAdes(http://spades.bioinf.spbau.ru/plasmidSPAdes/)
- PLACNET(https://sourceforge.net/projects/placnet/)
- PLACNET2FASTA(https://github.com/tomdeman-bio/PLACNET2FASTA):Converts PLACNET output to a FASTA file containing plasmid contigs
- Nullarbor:https://github.com/tseemann/nullarbor Pipeline to generate complete public health microbiology reports from sequenced isolates.
Genome-to-Genome Distance Calculator (GGDC):http://ggdc.dsmz.de/distcalc2.php 计算calculated DNA–DNA hybridization (DDH) value。
- MinCED:https://github.com/ctSkennerton/minced CRISPRs检测
- CRT:http://www.room220.com/crt/ CRISPR Recognition Tool
- Piggy(https://github.com/harry-thorpe/piggy):Pipeline for analysing intergenic regions in bacteria
- ISMapper(https://github.com/jhawkey/IS_mapper)ISMapper finds locations of an IS query in short read data using a series of mapping steps.
- ncbi-genome-download(https://github.com/kblin/ncbi-genome-download):Scripts to download genomes from the NCBI FTP servers。 示例: ~/.pyenv/versions/3.5.2/bin/ncbi-genome-download -F fasta -g Vibrio -o Vibrio_genomes -p 16 -r 15 bacteria
- PrimerMapper:https://github.com/dohalloran/PrimerMapper
- primer3(https://github.com/primer3-org/primer3)
- PrimerView(https://github.com/dohalloran/PrimerView)
- 蛋白功能注释分析的一些工具:https://classes.soe.ucsc.edu/bme225/Fall07/BME225.serverlist.html https://classes.soe.ucsc.edu/bme225/Fall08/BME225.serverlist08.html
-
COV2HTML:https://mmonot.eu/COV2HTML/connexion.php A Visualization and Analysis Tool of Bacterial Next Generation Sequencing (NGS) Data.
-
Bismark(https://www.bioinformatics.babraham.ac.uk/projects/bismark/):A tool to map bisulfite converted sequence reads and determine cytosine methylation states. (鉴定甲基化)
-
seqtools:http://www.sanger.ac.uk/science/tools/seqtools The SeqTools package contains three tools for visualising sequence alignments: Blixem, Dotter and Belvu.
- CLARI-TE:https://github.com/jdaron/CLARI-TE Predicts Transposable Elements (TEs) in complexe genome such as wheat(小麦).
- TRF:http://tandem.bu.edu/trf/trf.download.html
- Msatfinder:http://www.bioinformatics.org/project/?group_id=469 https://github.com/knirirr/Msatfinder Msatfinder is a simple Perl script that detects perfect microsatellite repeats (1-6 bp) in nucleic acid or protein sequences.
- MISA - MIcroSAtellite identification tool:http://pgrc.ipk-gatersleben.de/misa/
- msatcommander:http://www.softpedia.com/get/Science-CAD/msatcommander.shtml (windows平台)
拓展:
(1)SSR/STR分型
解决方法如下:
1.首先要确定研究的物种是什么?有很多物种是已经有文献发表的SSR序列,同时又对应的引物序列供参考。这种的比较简单,不用自己设计引物。但尽量选择文献报道,比较多的多态性好的位点。比如:大豆的SSR位点,对应的引物序列也有,但文献一般发表的位点有哪些,哪些位点做了很多研究,多态性比较好,尽量选择这样的位点。
2.所研究的物种,没有文献报道。这样的话,比较麻烦,需要自己开发SSR引物。首先,你要从该物种的基因组序列中,筛选STR位点。具体方法有很多,比较:富集文库的方法,SSR-Hunter软件,等,有很多SSR引物开发的方法和资料。从基因组序列上选择来讲,尽量选择不连锁的位点。筛选出重复序列的位点后,要对位点的多态性检测。最终筛出的位点:不连锁、多态性好、易扩增。
3.ABI3730上,最终上机是检测荧光信号,引物5‘端荧光标记,这个检测量和速度很快,成本高,只有筛好引物,后续批量实验时,再上机。前期引物筛选,还是用普通引物(不带标记),跑PAGE胶,取20个左右样本,大概看下扩增片段,多态性,即可。
首先你要有序列,不知你做的是什么物种。把这些序列输入到在线的:http://www.genomics.ceh.ac.uk/cgi-bin/msatfinder/msatfinder.cgi 网站中,确定微卫星所在的位置;然后在微卫星序列两翼设计引物。
-
Pfam_Scan(http://pfam.xfam.org/):蛋白结构域的预测
-
PfamScan工具(ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools/)
-
InterProscan官网 : http://www.ebi.ac.uk/interpro/ http://www.ebi.ac.uk/interpro/interproscan.html
-
AntiFam:ftp://ftp.ebi.ac.uk/pub/databases/Pfam/AntiFam/ 识别假的ORF AntiFam的文章:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308159/ http://xfam.org/ 如何执行AntiFam? hmmsearch --domtblout test_vs_antifam.out --tblout test_vs_antifam.out2 --domE 1e-10 --cpu 12 ../AntiFam.hmm test.faa
-
wKinMut-2:http://kinmut2.bioinfo.cnio.es/KinMut2 wKinMut-2 is an integrated framework for the analysis and interpretation of the consequences of variants in the human kinome.
-
GOTaxExplorer:http://gotax.bioinf.mpi-inf.mpg.de/ GOTaxExplorer presents a new approach to comparative genomics that integrates functional information and families with the taxonomic classification.
- PathSeq:用PathSeq进行跨物种污染识别 https://software.broadinstitute.org/gatk/blog?id=23205 ftp://ftp.broadinstitute.org/bundle/pathseq/
果蝇数据库:http://flybase.org/
酵母数据库:https://www.yeastgenome.org/
下载酵母数据:https://www.yeastgenome.org/download-data
适合于NGS数据的基因组组装软件
- ALLPATHS-LG
- Velvet
- SOAPdenovo
- Bambus2
- CABOG
- MSR-CA
- SGA
- VCAKE
- SHARCGS
- SSAKE
- Euler
适合Sanger数据的基因组组装软件
- Newbler
- Celera
- CABOG
- Edena
- Shorty
组装的算法:
A)overlap/layout/Consensus(OLC)methods (rely on an overlap graph)
软件有:CABOG 、Newbler、Shorty、Edena
B)De Bruijn Graph(DBG) methods(use some form of K-mer graph)\
软件:SOAPdenovo、Euler、Velvet
C)Greey graph alogorithms(use OLC or DBG)
软件:SSAKE、SHARCGS、VCAKE
(1)Library Genesis
(2)Sci-hub