Preparing data for 'CC' and 'quant' traits, tried but getting error
Closed this issue · 10 comments
Hi,
I wanted to perform co-localization test for my traits. I found this tool is perfect for research.
I was trying the following ways to prepare my data but always getting error. The 'SCZ' is 'CC' and 'BASOPHIL' is 'quant'
Here is my codes, a little mess though! and the example data. Any suggestion?
scz1=fread('/home/ubuntu/immunesystem_md/sumstats/PGC_SCZ_0518_EUR.sumstats') scz1=scz1[order(CHR,BP),] scz1$type='cc' check_dataset(scz1) scz2=scz1 head(scz1) baso1=fread('/home/ubuntu/immunesystem_md/from_nadine/sumstats/BCX2_BASOPHIL_UKB_zscore.csv') head(baso1) scz1$type='cc' check_dataset(scz1) names(scz1) scz2=scz1 names(scz2) names(scz2)=c('SNP','CHR','BP','pvalues','A1','A2','N','NCASE','NCONTROL','Z','OR','SE','INFO','VARIANT_ID','type') check_dataset(scz2) head(scz2) str(scz2) names(scz2) names(scz2)[10]='beta' names(scz2)[12]='varbeta' check_dataset(scz2) plot_dataset(scz2) scz3=D1[c('beta','varbeta','SNP','CHR','BP','type')] str(scz2) scz3=as.list(scz2) plot_dataset(scz3) ?plot_dataset check_dataset(scz3,warn.minp = 1e-10) plot_dataset(scz3) scz3=data(scz2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso1,p12 = 1e-6) str(scz3) scz3=as.list(scz2) str(scz3) baso2=as.list(baso1) str(baso2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso2,p12 = 1e-6) baso2=baso1[complete.cases(baso1),] baso2$type='quant' baso3=as.list(baso2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) head(baso3) head(scz3) str(baso3) head(baso1) head(baso2) str(baso1) names(baso1) baso2=baso1[,c(1:6,8:10)] head head(baso2) baso3=baso2[complete.cases(baso2),] head(baso3) dim(baso3) baso3=as.list(baso3) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) names(scz3) names(baso3) names(baso3)[4]=pvalues names(baso3)[4]='pvalues' names(baso3)[7]='beta' names(baso3)[8]='varbeta' my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) baso3$type='quant' my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) baso3$sdY=1 my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) head(baso2) baso2=baso2[complete.cases(baso2),] head(baso2) head(baso2,100) write.table(head(baso2,200),file = '/home/ubuntu/immunesystem_md/from_nadine/baso_for_coloc.csv',row.names = F,sep = '\t',quote = F) write.table(head(scz1,200),file = '/home/ubuntu/immunesystem_md/from_nadine/scz_for_coloc.csv',row.names = F,sep = '\t',quote = F)
The last error:
Error in process.dataset(d = dataset1, suffix = "df1") : dataset df1: please give s, proportion of samples who are cases, if using p values In addition: Warning messages: 1: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 2: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 3: In if (d$type == "cc" & "pvalues" %in% nd) { : the condition has length > 1 and only the first element will be used
Thank you.
The reading was good. Though for my dataset, I am little confused how to prepare them (attached is the first few lines of the original sumstat).I added a column "s" approximate proportion of cases in the dataset. Now I am getting the following error:
Error in process.dataset(d = dataset1, suffix = "df1") :
dataset df1: please give MAF if using p values
In addition: Warning messages:
1: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, :
the condition has length > 1 and only the first element will be used
2: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, :
the condition has length > 1 and only the first element will be used
3: In if (d$type == "cc" & "pvalues" %in% nd) { :
the condition has length > 1 and only the first element will be used
I do not have "MAF" in my datasets. That is why I sent you the original sumstats.
Do I need to rename the columns as in the vignette, such as, "PVAL" to "pvalues" or "SNP" to "snp" or "BP" to "position" ?
Could you please just take a look at my dataset and give me some advice about how to prepare them for the coloc.abf input?
Hi,
Thank you for the reply. Unfortunately for me, it did not help.
It was asking for "MAF" which I do not have in my data set. I have two data sets, one data does not have "beta" or "MAF".
What should I do?
Here is the quote from vignette:
But if you don’t have them, coloc can estimate them, given p values, MAF, sample size and, if case-control data, the fraction of samples that are cases:
Sorry for bothering you too much. Could you please just take a look at my dataset's column names and give me some advice?
Thanks a lot.
Yes, for the "baso2" I have beta. But for the "scz1" there is no beta.
Should I use "Z" as beta?
Thanks a lot. It works now. I create beta column: beta=Z*se
Also I am getting this error:
> coloc.detail()
Error in coloc.detail() : could not find function "coloc.detail"
I have restarted the rstudio, reinstalled coloc but it persists. Any help?
Also I am getting this error:
> coloc.detail()
Error in coloc.detail() : could not find function "coloc.detail"
I have restarted the rstudio, reinstalled coloc but it persists. Any help?
coloc.detail has been removed from most recent version