BENG183

Homework notes:

  1. Working environment setup
  2. Raw Data QC and Cleaning
  3. Mapping and quantification
  4. Differential analysis

Raw Sample files:

1)Sample for Raw Data cleaning illustration

We will use a dataset from a previous EBI training courses. This data is derived from sequencing of mRNA from zebrafish embryos in two different developmental stages. Sequencing was performed on the Illumina platform and generated 76bp paired-end sequence data using poly-(A)+ selected RNA.

Download data from ftp://ftp.ebi.ac.uk/pub/training/Train_online/RNA-seq_exercise/

  • 2cells_1.fastq
  • 2cells_2.fastq
  • 6h_post_fertilisation_R1.fastq
  • 6h_post_fertilisation_R2.fastq

2)Sample for the homework

We will use RNAseq data from FlyAtlas2 database, which collects hundreds of RNAseq data of drosophila melanogaster. You can search by gene, category or tissue. Here we downloaded 4 samples (female_head x 2, female_midgut x 2).

Clean Data

Reference genome.fa / transcriptome.fa / gtf

We usually download the reference data from ensemble. You search "drosophila" and choose DNA / cDNA / gtf, then you use a wget to download.

  • drosophila genome: ftp://ftp.ensembl.org/pub/release-97/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.22.dna.toplevel.fa.gz

  • drosophila transcriptome: ftp://ftp.ensembl.org/pub/release-97/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.22.cdna.all.fa.gz

  • drosophila gtf: ftp://ftp.ensembl.org/pub/release-97/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.22.97.chr.gtf.gz

Index files

Pre-computed index files: download here

Mapping .bam file

FeatureCounts Count table

Resources:

1.Tutorials:

[1] Weill Cornell Medical Colledge: http://chagall.med.cornell.edu/RNASEQcourse/

2.Software manuals: