MPRAmodel.R: A series of functions to assist in analysis of MPRA count tables
source("MPRAmodel.R")
mpra_out <- MPRAmodel(<countsTable>, <attributesData>, <conditionData>, <filePrefix>, <negCtrlName>, <posCtrlName>, <projectName>)
countsTable
: Table of counts with header. Should be either the output of the MPRAcount pipeline, or in the same format.attributesData
: Attributes for each oligo including oligo name, SNP, chromosome, position, reference allele, alt allele, Allele (ref/alt), window, strand, project, haplotype. For Oligos named in the form chr:pos:ref:alt:allele:window(:haplotype) the scripts here and here can be used to generate the attributes table.conditionData
: 2 columns w/no header column 1 is replicates as found in the count table, column 2 indicates cell type. This can be done in the program of your choice, or taken from the output of MPRAcount.
exclList
: ARRAY; List of cell types, in string format, that should be excluded from the analysis. This array is empty by default.filePrefix
: STRING; All written files will have this string included in the file name, if running the pipeline multiple times with different settings this is a good place to differentiate themplotSave
: LOGICAL; DefaultTRUE
, indicator of whether or not to save non-normalization QC plotsaltRef
: LOGICAL; DefaultTRUE
, indicator of how to sort alleles forcellSpecificTtest
method
: STRING; Default "ss", indicator of method to be used to normalize the data'ss'
: Summit Shift - shifts the l2fc density peak to line up with 0'ssn'
: Summit Shift (Negative Controls Only) - shifts the peak of negative controls to 0'ro'
: Remove outliers - remove oligos that don't have a p-value or have a p-value > 0.001 *'nc'
: Negative Controls - normalize only the negative controls
negCtrlName
: STRING; Default "negCtrl", how negative controls are indicated in the project column of theattributesData
fileposCtrlName
: STRING; Default "expCtrl", how positive controls are indicated in the project column of theattributesData
fileprojectName
: STRING; Default "MPRA_PROJ", a generalized name for the overall projecttTest
: LOGICAL; DefaultTRUE
, perform cell type specific tTest to determine emVARsDEase
: LOGICAL; DefaultTRUE
, use Differential Expression methods to detect Allele Specific Expression (this method can be found here)correction
: STRING; Default "BH", indicator of whether to use Benjamini Hochberg ("BH") or Bonferroni ("BF") for p-value correctioncutoff
: INT; Default 0.01, significance cutoff for including alleles for skew calculation (tTest only)upDisp
: LOGICAL; DefaultTRUE
, update dispersions with cell type specific calculationsprior
: LOGICAL; DefaultFALSE
, usebetaPrior=F
when calculating the cell type specific dispersions
addHaplo
- Add haplotype column to attribute data and resolve oligos with multiple projects, if oligos with multiple projects are listed separately, they will be collapsed into a single row, if oligos are identical but the SNP is different for them, an error will be thrown
use :addHaplo(attributesData, negCtrlName, posCtrlName, projectName)
cellSpecificTtest
- Function to perform TTest on individual cell types, determine the allelic skew, and identify emVARs
use :cellSpecificTtest(attributesData, counts_norm, dups_output, ctrl_mean, exp_mean, ctrl_cols, exp_cols, altRef, correction, cutoff)
conditionStandard
- Standardize condition data
use :conditionStandard(conditionData)
dataOut
- Retrieve output data for future functions - if only looking for results and not the plots this is the only function that needs to be called. Most functions should only need an output from here
use :dataOut(countsData, attributesData, conditionData, exclList, altRef, file_prefix, method, negCtrlName, tTest, Dease, correction, cutoff, upDisp, prior)
DESkew
- Function to determine the allelic skew of oligos and identify emVARs using the DESeq method to determine allelic skew
use :DESkew(conditionData, counts_norm, attributesData, celltype, dups_output, prior)
expandDups
- Expand IDs that denote duplicate oligos in count/DESeq results, or any dataframe with rownames that are separated by a ";"
use :expandDups(output)
fileDate
- Generate the date in YYYYMMDD format for separation of runs in the project
use :fileDate()
MPRAmodel
- Overall function which will run everything once called
use :MPRAmodel(countsData, attributesData, conditionData, exclList, filePrefix, plotSave, altRef, method, negCtrlName, posCtrlName, projectName, tTest, Dease, correction, cutoff, upDisp, prior, …)
mpraScatter
- Function to produce scatter plot of counts data for visualization of correlation between two samples
use :mpraScatter(conditionData, countsOut, sampleX, sampleY, xmax, ymax, plotSave)
oligoIsolate
- Remove Error, CIGAR, MD and position columns if necessary; aggregate count data with relation to the oligo, writes a copy of the collapsed raw counts
use :oligoIsolate(countsData, file_prefix)
panel.cor
- Internal function to produce correlation scatter plots
use :panel.cor(x, y, digits, prefix, cex.cor, …)
panel.lm
- Internal function to produce correlation scatter plots
use :panel.lm(x, y, pch, col.lm, …)
panel.nlm
- Internal function to produce correlation scatter plots
use :panel.nlm(x, y, pch, col.lm, …)
plor_logFC
- Function to produce plots showing the expression fold change vs. normalized tag counts
use :plot_logFC(full_output, sample, negCtrlName, posCtrlName)
processAnalysis
- Initial processing of files via DESeq analysis
use :processAnalysis(countsData, conditionData, exclList) <br>
tagNorm
- Normalize DESeq results and plot normalized densities for each cell type
use :tagNorm(countsData, conditionData, attributesData, exclList, method, negCtrlName, upDisp, prior) <br>
tagSig
- Replace dispersions of normalized dds with cell type specific dispersions, called within tagNorm does not need to be called by itself
use :tagSig(dds_results, dds_rna, cond_data, exclList, prior)