Salmo_salar_RNA_seq

These scripts comprise an analysis pipeline for differential expression, weighted gene co-expression network, and gene ontology overrepresentation analyses of RNA-seq data.

Corresponding article: Harder, AM, Willoughby, JR, Ardren, WR, Christie, MR. Among‐family variation in survival and gene expression uncovers adaptive genetic variation in a threatened fish. Mol Ecol. 2020; 29: 1035–1049. https://doi.org/10.1111/mec.15334

Run order for RNA-seq data:

  1. hisat2.sh: Builds HISAT2 index for reference genome and maps reads to reference genome. Outputs SAM file per sample.

  2. samtools_example.sh: Sorts SAM files and writes sorted BAM files.

  3. stringtie_example.sh: Assembles transcripts for each sample, merges all transcripts from all samples and the reference annotation, and compares assembled sample transcripts to known transcripts. Writes merged GTF file.

  4. feature_counts.sh: Generates text file with count matrix for all samples using sorted BAM files and merged GTF file.

  5. deseq2.R: Conducts differential gene expression analyses using DESeq2 in R. Requires count matrix generated with featureCounts and CSV file of sample information (i.e., family, treatment, sample name).

  6. wgcna.R: Constructs weighted gene co-expression network analysis using normalized counts written from DESeq2.

  7. goterms.R: Tests for gene ontology (GO) term enrichment in lists of module genes produced by WGCNA.

  8. metacoder_files: Code and example files for running metacoder tree building (Fig. 3 in manuscript) [added March 2021]

  9. permutations.R: Iterates over DEGs identified in deseq2.R to calculate regressions (1000X) for each gene and treatment combination, randomly including one individual per family and treatment combination per regression.

  10. perm_means_quantiles.R: Calculates means and quantiles for regression slopes and coefficent for each gene and treatment combination using output generated by permutations.R

A script is also included for survival analyses using mortality data.

survival_analysis.R: Constructs Kaplan-Meier survival distributions and fits Cox proportional hazards regression model to get hazard ratio values for genetic families.