GC content and length correlation plots

This script calculates summary statistics for read length and GC content from a fasta or fastq file and generates a statistics table and plot. Written for the dental calculus versus dentin project: Mann AE, Sabin S, Ziesemer KA, Vågene Å, Schroeder H, Ozga A, Sankaranarayanan K, Hofman CA, Fellows-Yates J, Salazar Garcia D, Frohlich B, Aldenderfer M, Hoogland M, Read C, Krause J, Hofman C, Bos K, Warinner C. (2018) Differential preservation of endogenous human and microbial DNA in dental calculus and dentin. Scientific Reports 8:9822.

GC content and length correlation plots is written with python 3+ and relies on the following packages:

Examples

Generate a plot showing the mean GC content for each lenth bin

gcLenCorPlots.py -i mytaxa.fa

Generate a plot showing the median GC content for each length bin and normalizing the input data to 2,000 reads

gcLenCorPlots.py -i mytaxa.fa -m median -s 2000

Generate a plot showing the mean GC content for each length bin normalizing the input data to 100 reads, and only considering those reads that are less than 200 bases long

gcLenCorPlots.py -i mytaxa.fa -m median -s 100 -t 200

Generate a plot showing the mean GC content for each length bin setting the range of the color scale to represent the maximum and minimum read counts and setting the error bar color to blue

gcLenCorPlots.py -i mytaxa.fa -r max -ec blue

Help and additional parameter descriptions

gcLenCorPlots.py -h

aemann01/gcLenCorPlots

GC content and length correlation plots

Examples