- Author: Chao Tong, Miao Li, Yongtao Tang, Kai Zhao
- Date: Apri-23, 2021
- Project description: Comparative genomics study to understand how Tibetan highland fish adapt to extremely alkaline environment
- Publication: Genomic signature of shifts in selection and alkaline adaptation in highland fish. Genome Biology and Evolution. 2021 link
- introduction to Trinity link
- install Trinity via conda
conda install -c bioconda trinity
start de novo assembly
Trinity \
--trimmomatic \
--seqType fq \
--max_memory 20G \
--left left.fq \
--right right.fq \
--CPU 20 \
--output trinity_output \
--no_bowtie \
--quality_trimming_params "SLIDINGWINDOW:4:20 LEADING:10 TRAILING:10 MINLEN:70" \
--normalize_reads \
--normalize_max_read_cov 100
- output files: GPRZ.fa (fasta format)
- introduction to CD-HIT link
- install CD-HIT via conda
conda install -c bioconda cd-hit
run cd-hit
- input files: GPRZ.fa (nucleotide sequence)
- output file: GPRZ_0.9.fa
cd-hit -i transcripts.fa -o transcripts_0.9.fa -c 0.9 -n 5 -M 16000 –d 0 -T 8
- introduction to TransDecoder link
- install TransDecoder via conda
conda install -c bioconda transdecoder
TransDecoder.LongOrfs -t transcripts_0.9.fa
TransDecoder.Predict -t transcripts_0.9.fa
- input files: GPRZ_0.9.fa
- output files:
- GPRZ.cds (nucleotide sequence)
- GPRZ.pep (protein sequence)
- introduction to R package, phangorn link
define the function pruneTreeFromAln
- input files:
- codon/AA alignment file (fasta/phylip format)
- phylogenetic tree file (nwk format, add label)
Rscript pruned_tree.R
- introduction to OMA link
- install OMA via conda
conda install -c hcc oma
update the <parameters.drw> file
- input files:
- genome data: protein sequences
- parameters.drw
- phylogenetic tree: define a outgroup
OMA -n 40
- prepare amino acid sequence alignment
- introduction to MUSCLE link
- install MUSCLE via conda
conda install -c bioconda muscle
- input file: {gene}.fas (fasta format)
- output file: {gene}.aln (fasta format)
snakemake --cores=1 -s snakefile_muscle
- prepare codon alignment
- introduction to pal2nal link
- install pal2nal via conda
conda install -c bioconda pal2nal
- input files: {gene}.fa (nucleotide sequence) and {gene}.aln (amino acid sequence)
- output files: {gene}.pml or {gene}.fas
perl pal2nal.pl $id.fa -output fasta -nogap > $id.fas
- estimate the rate of molecular evolution (dN/dS) for alkaline tolerant and alkaline intolerant fish species
- introduction to HyPHY link
- install HyPHY via conda
conda install -c bioconda hyphy
define foreground branches: species {test}
- input files:
- codon alignment file (phylip format)
- phylogenetic tree file (nwk format, add label)
run script with snakemake link
snakemake --cores=1 -s snakefile_relax
- Likelihood Ratio Test
- lnLH1: two discrete ratios of dN/dS
- lnLH0: one ratio of dN/dS
ΔlnL = 2(lnLH1-lnLH0)
discard the gene with reported LRT P value > 0.05
- extract rapidly evolving gene
compare the two ratios of dN/dS, LRT P value < 0.05
- ω(alkaline tolerant species) > ω(alkaline intolerant species): rapidly evolving genes in alkaline tolerant species
- detect the signal of positive selection at at least one site on at least one branch of a prori defined branches (e.g. alkaline tolerant fish)
- input files:
- codon alignment file (phylip format)
- phylogenetic tree file (nwk format, add label)
snakemake --cores=1 -s snakefile_busted
- extract positively selected genes BUSTED model automatically discard the gene with reported LRT P value > 0.5 further check the reported output files:
- gene with LRT P value < 0.05: positively selected gene
- prepare the background GO dataset for shared orthologs
- introduction to Trinoate pipeline link
- prepare input geneset (e.g. rapidly evolving genes)
-
introduction to R package, topGO link
-
algorithm = "classic"
-
statistic = "fisher"
-
input files: gene.csv and fish_go_annotation.csv
-
output file: gene_classic_fisher_enriched_GO.csv
Rscript topGO_run.R
- further execute similarity filter
- introduction to REVIGO link
- input files: enriched_GO.txt (paste to blank on the web page)
- output files: download, or make a Rscript