chr1swallace/coloc

Preparing data for 'CC' and 'quant' traits, tried but getting error

Closed this issue · 10 comments

Hi,

I wanted to perform co-localization test for my traits. I found this tool is perfect for research.
I was trying the following ways to prepare my data but always getting error. The 'SCZ' is 'CC' and 'BASOPHIL' is 'quant'
Here is my codes, a little mess though! and the example data. Any suggestion?

scz1=fread('/home/ubuntu/immunesystem_md/sumstats/PGC_SCZ_0518_EUR.sumstats') scz1=scz1[order(CHR,BP),] scz1$type='cc' check_dataset(scz1) scz2=scz1 head(scz1) baso1=fread('/home/ubuntu/immunesystem_md/from_nadine/sumstats/BCX2_BASOPHIL_UKB_zscore.csv') head(baso1) scz1$type='cc' check_dataset(scz1) names(scz1) scz2=scz1 names(scz2) names(scz2)=c('SNP','CHR','BP','pvalues','A1','A2','N','NCASE','NCONTROL','Z','OR','SE','INFO','VARIANT_ID','type') check_dataset(scz2) head(scz2) str(scz2) names(scz2) names(scz2)[10]='beta' names(scz2)[12]='varbeta' check_dataset(scz2) plot_dataset(scz2) scz3=D1[c('beta','varbeta','SNP','CHR','BP','type')] str(scz2) scz3=as.list(scz2) plot_dataset(scz3) ?plot_dataset check_dataset(scz3,warn.minp = 1e-10) plot_dataset(scz3) scz3=data(scz2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso1,p12 = 1e-6) str(scz3) scz3=as.list(scz2) str(scz3) baso2=as.list(baso1) str(baso2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso2,p12 = 1e-6) baso2=baso1[complete.cases(baso1),] baso2$type='quant' baso3=as.list(baso2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) head(baso3) head(scz3) str(baso3) head(baso1) head(baso2) str(baso1) names(baso1) baso2=baso1[,c(1:6,8:10)] head head(baso2) baso3=baso2[complete.cases(baso2),] head(baso3) dim(baso3) baso3=as.list(baso3) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) names(scz3) names(baso3) names(baso3)[4]=pvalues names(baso3)[4]='pvalues' names(baso3)[7]='beta' names(baso3)[8]='varbeta' my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) baso3$type='quant' my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) baso3$sdY=1 my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) head(baso2) baso2=baso2[complete.cases(baso2),] head(baso2) head(baso2,100) write.table(head(baso2,200),file = '/home/ubuntu/immunesystem_md/from_nadine/baso_for_coloc.csv',row.names = F,sep = '\t',quote = F) write.table(head(scz1,200),file = '/home/ubuntu/immunesystem_md/from_nadine/scz_for_coloc.csv',row.names = F,sep = '\t',quote = F)

The last error:

Error in process.dataset(d = dataset1, suffix = "df1") : dataset df1: please give s, proportion of samples who are cases, if using p values In addition: Warning messages: 1: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 2: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 3: In if (d$type == "cc" & "pvalues" %in% nd) { : the condition has length > 1 and only the first element will be used

baso_for_coloc.csv
scz_for_coloc.csv

Thank you.

The reading was good. Though for my dataset, I am little confused how to prepare them (attached is the first few lines of the original sumstat).I added a column "s" approximate proportion of cases in the dataset. Now I am getting the following error:

Error in process.dataset(d = dataset1, suffix = "df1") : 
  dataset df1: please give MAF if using p values
In addition: Warning messages:
1: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix,  :
  the condition has length > 1 and only the first element will be used
2: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix,  :
  the condition has length > 1 and only the first element will be used
3: In if (d$type == "cc" & "pvalues" %in% nd) { :
  the condition has length > 1 and only the first element will be used

I do not have "MAF" in my datasets. That is why I sent you the original sumstats.
Do I need to rename the columns as in the vignette, such as, "PVAL" to "pvalues" or "SNP" to "snp" or "BP" to "position" ?

Could you please just take a look at my dataset and give me some advice about how to prepare them for the coloc.abf input?

baso_for_coloc.csv
scz_for_coloc.csv

Hi,

Thank you for the reply. Unfortunately for me, it did not help.
It was asking for "MAF" which I do not have in my data set. I have two data sets, one data does not have "beta" or "MAF".
What should I do?

Here is the quote from vignette:

But if you don’t have them, coloc can estimate them, given p values, MAF, sample size and, if case-control data, the fraction of samples that are cases:

Sorry for bothering you too much. Could you please just take a look at my dataset's column names and give me some advice?
two_datasets_for_coloc

Thanks a lot.

Yes, for the "baso2" I have beta. But for the "scz1" there is no beta.

Should I use "Z" as beta?

Thanks a lot. It works now. I create beta column: beta=Z*se

Also I am getting this error:


> coloc.detail()
Error in coloc.detail() : could not find function "coloc.detail"

I have restarted the rstudio, reinstalled coloc but it persists. Any help?

Also I am getting this error:

> coloc.detail()
Error in coloc.detail() : could not find function "coloc.detail"

I have restarted the rstudio, reinstalled coloc but it persists. Any help?

coloc.detail has been removed from most recent version