/BulkSeq

Supporting material for book chapter 'Identification of parent-of-origin-dependent QTLs using bulk-segregant sequencing (Bulk-Seq)'

Primary LanguageRMIT LicenseMIT

BulkSeq

This set of scripts is part of the chapter 'Identification of parent-of-origin-dependent QTLs using bulk-segregant sequencing (Bulk-Seq)', from the book Plant Chromatin Dynamics (Springer 2018)

GenomeSNPmask.py: Remove or replace known SNP positions from a genome sequence file (fasta)

mapping.sh: minimal set of commands to filter and map reads from a fastq file, call SNPs and output a out.vcf file with allele frequencies

snpFile.R: retrieve publicly available snp data for the Cvi-0 and Ler-1 accessions of Arabidopsis thaliana, merge and output a reformatted snp matrix (snpm.txt)

cleanCounts.R: merge information from the snp matrix (snpm.txt) with the measured allele frequencies (out.vcf), filters and outputs a counts.csv file with allele counts

pool.R: combine allele frequencies from two samples (obtained with cleanCounts.R) and calculate relative frequencies along chromosomes

Requires:

FastQC 0.11.3; cutadapt 1.8.3; Samtools 1.2 (using htslib 1.2.1); Bowtie 2 2.2.9; R 3.3.1; scales_0.4.0; ggplot2 2.1.0; zoo 1.7-13; Python 3.4.0; Bio 1.65;


Example fastq datasets that can be used in this analysis are available in the ArrayExpress database (www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-5196:

WT_pool_1 (1.56GB)

mea_pool_1 (2GB)