This R script uses a bagged clustering algorithm as implemented in the R function 'classIntervals' (package 'classInt') to fit two normal distributions to the two overlapping Ct values distributions (as originating, for example, from an allele-specifc qPCR where the specific (mutant) target is apmplified earlier than the non-mutant target). It classifies each sample as belonging to one of the two distributions and flags samples which cannot be unique assigned to one of the two distributions (i.e. samples located in the overlap of the two distributions). The script is used for analysis of the allele-specific qPCR assay used in our recent publication "Investigation of an LPA KIV-2 nonsense mutation in 11,000 individuals: the importance of linkage disequilibrium structure in LPA genetics." Bioarxiv 2019 [LINK]. For a detailed description please refer to this publication.
- Run function below included in script
- Use the function as follows (assuming dataset 'data')
results <- CToutliers2step(CTvalues = data$ct, sampleID = data$id, freqCarrier = 0.1, ['optional_arguments'])
- CTvalues: measured Ct values [MANDATORY]
- sampleID: unique key-variable with the IDs of the samples [MANDATORY]
- freqCarrier: estimated carrier frequency of the variant [MANDATORY]
- minplot: minimal value to be plotted on the x-axis (default=20)
- maxplot: maximal value to be plotted on the x-axis (default=40)
- prob1: Values above the (1-prob1)-quantile of the carrier distribution and below the prob1-Quantile of the Non-Carrier distribution are identified and marked as outliers, i.e. they cannot be assigned unambiguously to either the carrier or noncarrier-distribution (default=0.01)
- prob2: As prob1, but more conservative (default=0.025)
- Dataframe including all input-samples with the following variables:
- sampleID
- CTvalues
- quantCarrier: distribution function of the estimated normal curve for carriers
- quantNonCarrier: distribution function of the estimated normal curve for non-carriers
- out1: =1: sample has been identified as outlier using prob1
- out2: =1: sample has been identified as outlier using prob2