Bioinformatics

pipeline

Purity/ploidy

  • sequenza - estimate cancer cellularity, ploidy, genome wide copy number profile and infer for mutated alleles.
  • ABSOLUTE - hbsun
  • PureCN - opy number calling and SNV classification using targeted short read sequencing
  • PURPLE - 纯度和倍性计算
  • facets - MSKCC的计算纯度和倍性的方法

Immuno infiltration

  • MCPcounter - Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression
  • xCell - cell type enrichment analysis from gene expression data for 64 immune and stroma cell types
  • GSVA - Gene set variation analysis for microarray and RNA-seq data
  • CIBERSORT - an estimation of the abundances of member cell types in a mixed cell population, using gene expression data.
  • TIDE - 用RNAseq数据计算免疫逃逸的能力
  • TIMER - 免疫浸润计算
  • IOBR - 集合多种计算免疫微环境的包
  • ESTIMATE - is a tool for predicting tumor purity, and the presence of infiltrating stromal/immune cells in tumor tissues using gene expression data.

Genome-related web

Database

Cancer database

Protein

Annotation

QC

  • trimmomatic - 去接头软件
  • trim_galore - 质控去接头软件,包装了cutadapt和fastqc,可以自动识别接头
  • cutadapt - 去接头软件
  • fastqc - 质控软件
  • fastp - 质控软件
  • AfterQC - 质控软件
  • MultiQC - 整合QC质控结果
  • Conpair - 计算肿瘤样本VS对照样本配对和污染情况
  • contatester - Compute the Allelic Balance of a sample from a VCF file, check if a cross human contamination is present and estimate the degree of contamination.
  • preseq - Software for predicting library complexity and genome coverage in high-throughput sequencing.
  • rasusa- Randomly subsample sequencing reads to a specified coverage

UMI

  • ConsensusCruncher - ConsensusCruncher is a tool that suppresses errors in next-generation sequencing data by using unique molecular identifers (UMIs) to amalgamate reads derived from the same DNA template into a consensus sequence.
  • fgbio
  • UMI-tools

SNV

CNV

  • ascat - Allele-specific copy number analysis of tumors
  • GATK
  • bedtools - a powerful toolset for genome arithmetic
  • HMMcopy
  • facets - CNV for WES
  • cnv_facets - 包装的FACETS软件,用于计算CNV、纯度、倍性
  • CNVkit - CNV分析,基于on-target和off-target
  • CopywriteR - CNV分析,基于off-target
  • DNAcopy - CNV分析
  • CONTRA - 目标区域CNV分析
  • copyCat - somatic copy number aberrations
  • ExomeCNV - CNV (Copy-Number Variants) and LOH (Loss of Heterozygosity) from exome sequencing data
  • ichorCNA - estimating the fraction of tumor in cell-free DNA from ultra-low-pass whole genome sequencing (ULP-WGS, 0.1x coverage)
  • PureCN - This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality.
  • sequenza - Copy Number Estimation from Tumor Genome Sequencing Data
  • ascets - Arm-level Somatic Copy-number Events in Targeted Sequencing
  • CNTools - Convert segment data into a region by sample matrix to allow for other high level computational analyses.

SV

  • pindel - detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications
  • breakdancer - genome-wide detection of structural variants from next generation paired-end sequencing reads
  • lumpy - A probabilistic framework for structural variant discovery
  • manta - Structural variant and indel caller
  • SVision - Detecting genome structural variants with deep learning in single molecule sequencing

Gene fusion

MSI

STR

HRD

  • scarHRD - scarHRD is an R package which determines the levels of homologous recombination deficiency (telomeric allelic imbalance, loss off heterozygosity, number of large-scale transitions) based on NGS (WES, WGS) data.

RNA

  • rna-seq-strand - RNA分析链向选择
  • Hisat2 - graph-based alignment of next generation sequencing reads to a population of genomes
  • stringtie - Transcript assembly and quantification for RNA-Seq
  • gffread - 通过gff文件提取转录本fasta序列
  • Xfam - RNA数据库
  • featureCounts - map到基因的reads统计
  • htseq - map到基因的reads统计htseq-count
  • miRBase - miRNA数据库
  • STAR - RNAseq mapping tools
  • STAR-Fusion - 基因融合分析工具
  • arriba - Arriba is a command-line tool for the detection of gene fusions from RNA-Seq data.
  • Trinity_CTAT - Trinity Cancer Transcriptome Analysis Toolkit
  • Trinity - RNA-Seq De novo Assembly
  • Cufflinks - Transcriptome assembly and differential expression analysis for RNA-Seq
  • GimmeMotifs - motif analysis
  • gkmsvm - Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features
  • meme - Motif-based sequence analysis tools
  • rnacocktail - RNA分析流程 DOI: 10.1038/s41467-017-00050-4
  • MSigDB - a collection of annotated gene sets for use with GSEA software
  • rbsurv - This package selects genes associated with survival. 获取与生存时间相关的基因
  • WGCNA - WGCNA其译为加权基因共表达网络分析。该分析方法旨在寻找协同表达的基因模块(module),并探索基因module与关注的表型之间的关联关系,以及网络中的核心基因。http://www.stat.wisc.edu/~yandell/statgen/ucla/WGCNA/wgcna.html
  • CIBERSORT - to provide an estimation of the abundances of member cell types in a mixed cell population, using gene expression data.
  • RSEM - RNA分析工具

Genome assembly

Mapping

  • BWA - BWA is a software package for mapping low-divergent sequences against a large reference genome
  • BBmap
  • GSNAP - Genomic Short-read Nucleotide Alignment Program
  • DRAGMAP - Illumina Dragmap is the Dragen mapper/aligner Open Source Software.
  • alignment_tools - alignment工具总结

Predict genes

  • AUGUSTUS - AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences
  • TransDecoder - CDS预测

TCR

  • MIGEC - Molecular Identifier Guided Error Correction pipeline
  • igblast - NCBI开发的用于比对TCR的工具
  • vdjtools - TCR分析套件
  • vdjmatch - Matching T-cell repertoire against a database of TCR antigen specificities
  • gliph - GLIPH clusters TCRs that are predicted to bind the same MHC-restricted peptide antigen.
  • iSMART - immuno-Similarity Measurement by Aligning Receptors of T cells
  • TRUST4 - 使用RNAseq数据分析TCR和BCR
  • VDJdb - CDR3-抗原数据库
  • immunarch - Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R
  • TRUST4 - TRUST is a computational tool to analyze TCR and BCR sequences using unselected RNA sequencing data

Evolution

  • phangorn - 肿瘤进化算法
  • CITUP - Clonality Inference in Multiple Tumor Samples using Phylogeny
  • deconstructSigs - R语言 signature分析
  • LICHeE - Fast and scalable inference of multi-sample cancer lineages
  • ExpressBetaDiversity - 可基于tree计算beta多样性
  • phangorn - 肿瘤进化
  • clonevol - Inferring and visualizing clonal evolution in multi-sample cancer sequencing
  • dndscv - 计算肿瘤样本dn/ds
  • btctools - 计算ds/dn
  • treeomics - 构建进化树
  • MACHINA - 构建进化树
  • MOBSTER - 克隆进化2020
  • sciclone - 克隆进化分析
  • pyclone - 克隆进化分析
  • citup - The following package implements the method described in Clonality inference in multiple tumor samples using phylogeny
  • TimeScape - TimeScape is a visualization tool for temporal clonal evolution.
  • SCHISM - SCHISM is a computational tool designed to infer subclonal hierarchy and the tumor evolution from somatic mutations.
  • EstimateClonality
  • ECLIPSE - R package for clonal deconvolution of tumour-informed ctDNA data using clonality and copy number information from tumour tissue.

Neoantigen

Tools

  • STRING - 蛋白互作
  • SetRank - 富集分析
  • GSVA - 富集分析,可以进行ssGSEA分析
  • bam-readcount - 位点深度统计
  • somaticfreq - knowledge-based genotyping of targetted somatic variants from the tumor BAM file
  • Wgsim - 数据模拟
  • DWGSIM - 数据模拟
  • bamgineer - 数据模拟
  • vcf2bed
  • slicer - Slice a text file (like FastQ) to smaller files by lines, with gzip supported
  • Vt - a variant tool set that discovers short variants from Next Generation Sequencing data.
  • hgvs - 解析hgvs
  • deeptools - tools for exploring deep sequencing data
  • sambamba - bam文件处理工具,类似samtools,速度快
  • BamUtil - bamUtil is a repository that contains several programs that perform operations on SAM/BAM files.
  • bamtools - A small, but powerful suite of command-line utility programs for manipulating and querying BAM files.
  • ngs-bits - Short-read sequencing tools (SampleGender)

Web

Microbe/Virus

  • kraken - Kraken taxonomic sequence classification system
  • microbiology - 一些分析软件
  • Parsnp - 细菌/病毒的call SNP
  • fastv - 病毒检测

Methylation

Driver gene

plot

miRNA

single-cell

  • [scrna-tools](https://www.scrna-tools.org/
  • awesome-single-cell
  • single-cell-tutorial - Scripts for "Current best-practices in single-cell RNA-seq: a tutorial"
  • tracer - 单细胞RNAseq测序的TCR分析
  • awesome-single-cell - List of software packages (and the people developing these methods) for single-cell data analysis
  • SingleCell - 计算CNV score,可用于区分malignant cell
  • infercnv - Inferring copy number alterations from tumor single cell RNA-Seq data
  • infercnvpy - Scanpy plugin to infer copy number variation (CNV) from single-cell transcriptomics data
  • copykat - Inference of genomic copy number and subclonal structure of human tumors from high-throughput single cell RNAseq data
  • batchbench - BatchBench is a Nextflow workflow for running the following scRNA-Seq data batch effect correction methods
  • dyno - 60种轨迹分析
  • DoubletFinder - 过滤doublets细胞
  • DoubletDecon - 基于反卷积方法去除doublets细胞
  • DropletUtils - Utilities for handling droplet-based single-cell RNA-seq data
  • AUCell - 单细胞的Gene set分析
  • GENIE3 - Infer gene regulatory network (GRNs) based on co-expression patterns
  • RcisTarget - Transcription factor binding motif enrichment
  • pySCENIC - pySCENIC is a lightning-fast python implementation of the SCENIC pipeline which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
  • scvelo - RNA velocity generalized through dynamical modeling
  • Velocyto - Velocyto is a library for the analysis of RNA velocity.
  • monocle3 - An analysis toolkit for single-cell RNA-seq.
  • harmony - 主要用于数据整合
  • cellphonedb - 细胞通讯
  • cellchat - 细胞通讯
  • stlearn - a downstream analysis toolkit for Spatial Transcriptomics data
  • scFusion - scFusion is a computational pipeline for detecting gene fusions at single-cell resolution. scFusion works on Linux/Mac OS

signature

Chip-seq

  • samr - 芯片数据分析,置换检验,没有limma的稳健性好
  • limma
  • HOMER - HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis.

Cluster

NIPT

  • wisecondor - Detect fetal trisomies and smaller CNV's in a maternal plasma sample using whole-genome data.