Look into strand flip issues in summary data
Closed this issue · 2 comments
In fact, the association test R script took care of the both the different reference issue and the strand flip issues. The function they used is quoted below.
Basically, it flip the sign of the Z-score and the SNP alleles of any SNP whose ref base does not matched that of the bim files. For the SNPs who may have the strand flip issues, i.e the one with AT or CG as ref and alt, as well as any snp that are not ACTG, the sum stat are removed from analysis.
In the future analysis, when we need to do association test with other molecular phenotype, we can reuse the following codes.
allele.qc = function(a1,a2,ref1,ref2) {
a1 = toupper(a1)
a2 = toupper(a2)
ref1 = toupper(ref1)
ref2 = toupper(ref2)ref = ref1 flip = ref flip[ref == "A"] = "T" flip[ref == "T"] = "A" flip[ref == "G"] = "C" flip[ref == "C"] = "G" flip1 = flip ref = ref2 flip = ref flip[ref == "A"] = "T" flip[ref == "T"] = "A" flip[ref == "G"] = "C" flip[ref == "C"] = "G" flip2 = flip; snp = list() snp[["keep"]] = !((a1=="A" & a2=="T") | (a1=="T" & a2=="A") | (a1=="C" & a2=="G") | (a1=="G" & a2=="C")) snp[["keep"]][ a1 != "A" & a1 != "T" & a1 != "G" & a1 != "C" ] = F snp[["keep"]][ a2 != "A" & a2 != "T" & a2 != "G" & a2 != "C" ] = F snp[["flip"]] = (a1 == ref2 & a2 == ref1) | (a1 == flip2 & a2 == flip1) return(snp)
}
Flip Z-scores for mismatching alleles
sumstat$Z[ qc$flip ] = -1 * sumstat$Z[ qc$flip ]
sumstat$A1[ qc$flip ] = genos$bim[qc$flip,5]
sumstat$A2[ qc$flip ] = genos$bim[qc$flip,6]Remove strand ambiguous SNPs (if any)
if ( sum(!qc$keep) > 0 ) {
genos$bim = genos$bim[qc$keep,]
genos$bed = genos$bed[,qc$keep]
sumstat = sumstat[qc$keep,]
}