/BWASPR

R functions and scripts to process output from the BWASP workflow

Primary LanguageR

BWASPR

R functions and scripts to process output from the BWASP workflow

This repository contains R functions and scripts we use to analyze the *.mcalls output files from the BWASP workflow.

Installation

Please find detailed installation instructions and options in the INSTALL document.

Reference

Claire Morandin and Volker P. Brendel (2021) Tools and applications for integrative analysis of DNA methylation in social insects. Molecular Ecology Resources, 00, 1-19. https://doi.org/10.1111/1755-0998.13566.

Original pre-print: at BioRxiv.

Contact

Please direct all comments and suggestions to Volker Brendel at Indiana University.

What BWASPR does

Required input to the BWASPR workflow consists of the *.mcalls files (tab delimited data for the named columns)

SeqID.Pos SequenceID  Position  Strand Coverage  Prcnt_Meth  Prcnt_Unmeth

and two files specifying the data labels and *.mcalls file locations and certain parameters, respectively. Let's look at the example files in inst/extdata:

AmHE.dat
================================================================================
# Samples from Herb et al. (2012) Nature Neuroscience:
#
Am      HE      forager 0       CpGhsm  ../inst/extdata/Amel-forager.CpGhsm.mcalls
Am      HE      forager 0       CpGscd  ../inst/extdata/Amel-forager.CpGscd.mcalls
Am      HE      nurse   0       CpGhsm  ../inst/extdata/Amel-nurse.CpGhsm.mcalls
Am      HE      nurse   0       CpGscd  ../inst/extdata/Amel-nurse.CpGscd.mcalls
AmHE.par
================================================================================
SPECIESNAME     Apis mellifera
ASSEMBLYVERSION Amel_4.5
GENOMESIZE      250270657
TOTALNBRPMSITES 20307353
SPECIESGFF3DIR  ../inst/extdata/AmGFF3DIR
GENELISTGFF3    Amel.gene.gff3
EXONLISTGFF3    Amel.exon.gff3
PCGEXNLISTGFF3  Amel.pcg-exon.gff3
PROMOTRLISTGFF3 Amel.promoter.gff3
CDSLISTGFF3     Amel.pcg-CDS.gff3
UTRFLAGSET      1
5UTRLISTGFF3    Amel.pcg-5pUTR.gff3
3UTRLISTGFF3    Amel.pcg-3pUTR.gff3

The first file has columns for species (here Am); study (here HE); sample (here forager and nurse"); replicate number (here 0, indicating single samples or, as in the case of this study, aggregates over replicates); and file locations (here for the CpGhsm and CpGscd *.mcalls files); note that the file locations in this example are relative links, assuming you will run the example discussed in the demo directory. The second file specifies the species name, genome assembly version, genome size (in base pairs), total number of potential methylation sites (CpGs), and file names for GFF3 annotation of various genomic features (UTRFLAGSET is set to 1 to use UTR annotation in the GFF3 file).

A typical BWASPR workflow will read the specified *.mcalls files and generate various output tables and plots, labeled in various ways with species_ study_ sample_ replicate labels. The demo/Rscript.BWASPR file shows a template workflow. Initial customization is done at the top of the file and mostly from inclusion of a configuration file such as demo/sample.conf. The following table summarizes the successive workflow steps. You may want to open the demo/Rscript.BWASPR and demo/sample.conf in separate windows as a reference while viewing the table. Details on running the workflow with the demo data are given in demo/README.

RUNflag to expected output correspondence

RUNflag input (select) parameters function theme output files
RUNcms studymk covlist, locount, hicount cmStats() sample coverage and methylation statistics cms-*.txt
cms-*.pdf
RUNpwc studymk
studymc
- cmpSites() pairwise sample comparisons pwc-*.vs.*.txt
RUNcrl studymk destrand cmpSamples() correlations between aggregate samples crl-*.txt
crl-*.pdf
RUNrepcms replicate *.mcalls repcovlist,
replocount, rephicount
cmStats() replicate coverage and methylation statistics repcms-*.txt
repcms-*.pdf
RUNrepcrl replicate *.mcalls destrand cmpSamples() correlations between replicates repcrl-*.txt
repcrl-*.pdf
RUNmmp studymk - map_methylome() methylation to annotation maps mmp-*.txt
RUNacs studymk destrand annotate_methylome() annotation of common sites acs-*.txt
RUNrnk studymk genome_ann$region rank_rbm() ranked genes and promoters ranked-*.txt
sites-in-*.txt
rnk-sig-*.pdf
sip-*.txt
rnk-sip-*.txt
rnk-sip-*.pdf
RUNmrpr studymk ddset
nr2d
doplots
det_mrpr() methylation-rich and -poor regions dst-*.txt
*ds-*.pdf
mdr-*.tab
mdr-*.bed
mpr-*.txt
mrr-*.txt
rmp-*.txt
gwr-*.txt
RUNdmt studymc wsize, stepsize det_dmt() differentially methylated tiles and genes dmt-*.txt
dmg-*.txt
RUNdmsg sample *.mcalls
highcoverage
destrand
det_dmsg() differentially methylated sites and genes dms-*.txt
dmg-*.txt
RUNdmgdtls studyhc destrand show_dmsg() details for differentially methylated genes dmg-*.vs.*_details.txt
dmg-*.vs.*_heatmaps.pdf
RUNogl studyhc - explore_dmsg()
rank_dmg()
ranked lists of differentially methylated genes ogl-*.txt
rnk-dmg-*.vs.*.txt
rnk-dmg-*.vs.*.pdf
wrt-*.txt
RUNsave workflow output - save.image() save image of workflow output *.RData