This is the pipeline for RNA-seq DESeq2 analyzing, which uses the data from publication A circadian clock regulates efflux by the blood-brain barrier in mice and human cells.
Which includes:
- QC test: FastQC + MultiQC
- Alignment: STAR
- Generate read summarization: featureCounts
- Differential Expression (in R Studio): DESeq2
- STAR v2.7.8a
- Python v3.6.13
- Samtools v1.12
- Snakemake v5.7.0
- Subread v2.0.1
For this exercise, download ctrl ZT2, ZT6 each 3 pairs of fatsq files (total 12 fastq files) from GEO.
Now we'll have two conditions, three replicates:
- CTRL ZT2 replicate1 (SRR9973379) --> ZT02(R1)
- CTRL ZT2 replicate2 (SRR9973380) --> ZT02(R2)
- CTRL ZT2 replicate3 (SRR9973381) --> ZT02(R3)
- CTRL ZT6 replicate1 (SRR9973385) --> ZT06(R1)
- CTRL ZT6 replicate2 (SRR9973386) --> ZT06(R2)
- CTRL ZT6 replicate3 (SRR9973387) --> ZT06(R3)
Use sra-tools to download these files with download.sh script.
Download sra-tools by:
$ wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.11.0/sratoolkit.2.11.0-ubuntu64.tar.gz
$ tar -zxvf sratoolkit.2.11.0-ubuntu64.tar.gz
Ensemble v102 GRCm38.
$ wget -c http://ftp.ensembl.org/pub/release-102/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
$ wget -c http://ftp.ensembl.org/pub/release-102/gtf/mus_musculus/Mus_musculus.GRCm38.102.gtf.gz
$ gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
$ gunzip Mus_musculus.GRCm38.102.gtf.gz
Run Snakefile by snakemake -p -j 20
Navigate to desktop's local terminal and enter the following command at command line:
$scp LinuxUserName@avisIP:counts.txt/file/path ~/Desktop/
SnakeMake STAR script
FeatureCounts Documentation
DESeq2 Documentation