This set of scripts is part of the chapter 'Identification of parent-of-origin-dependent QTLs using bulk-segregant sequencing (Bulk-Seq)', from the book Plant Chromatin Dynamics (Springer 2018)
GenomeSNPmask.py: Remove or replace known SNP positions from a genome sequence file (fasta)
mapping.sh: minimal set of commands to filter and map reads from a fastq file, call SNPs and output a out.vcf file with allele frequencies
snpFile.R: retrieve publicly available snp data for the Cvi-0 and Ler-1 accessions of Arabidopsis thaliana, merge and output a reformatted snp matrix (snpm.txt)
cleanCounts.R: merge information from the snp matrix (snpm.txt) with the measured allele frequencies (out.vcf), filters and outputs a counts.csv file with allele counts
pool.R: combine allele frequencies from two samples (obtained with cleanCounts.R) and calculate relative frequencies along chromosomes
Requires:
FastQC 0.11.3; cutadapt 1.8.3; Samtools 1.2 (using htslib 1.2.1); Bowtie 2 2.2.9; R 3.3.1; scales_0.4.0; ggplot2 2.1.0; zoo 1.7-13; Python 3.4.0; Bio 1.65;
Example fastq datasets that can be used in this analysis are available in the ArrayExpress database (www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-5196:
WT_pool_1 (1.56GB)
mea_pool_1 (2GB)