BENG183
Homework notes:
Raw Sample files:
1)Sample for Raw Data cleaning illustration
We will use a dataset from a previous EBI training courses. This data is derived from sequencing of mRNA from zebrafish embryos in two different developmental stages. Sequencing was performed on the Illumina platform and generated 76bp paired-end sequence data using poly-(A)+ selected RNA.
Download data from ftp://ftp.ebi.ac.uk/pub/training/Train_online/RNA-seq_exercise/
- 2cells_1.fastq
- 2cells_2.fastq
- 6h_post_fertilisation_R1.fastq
- 6h_post_fertilisation_R2.fastq
2)Sample for the homework
We will use RNAseq data from FlyAtlas2 database, which collects hundreds of RNAseq data of drosophila melanogaster. You can search by gene, category or tissue. Here we downloaded 4 samples (female_head x 2, female_midgut x 2).
Clean Data
Reference genome.fa / transcriptome.fa / gtf
We usually download the reference data from ensemble. You search "drosophila" and choose DNA / cDNA / gtf, then you use a wget
to download.
-
drosophila genome: ftp://ftp.ensembl.org/pub/release-97/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.22.dna.toplevel.fa.gz
-
drosophila transcriptome: ftp://ftp.ensembl.org/pub/release-97/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.22.cdna.all.fa.gz
-
drosophila gtf: ftp://ftp.ensembl.org/pub/release-97/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.22.97.chr.gtf.gz
Index files
Pre-computed index files: download here
Mapping .bam file
FeatureCounts Count table
Resources:
1.Tutorials:
[1] Weill Cornell Medical Colledge: http://chagall.med.cornell.edu/RNASEQcourse/
2.Software manuals:
- Bioconda starting from 3:30.
- fastQC manual
- fastp manual