chr1swallace/coloc

LD matrix

Closed this issue · 13 comments

Hello,

I'm using the coloc susie function to study colocalisation of two disorders. I have calculated LD-matrices of 1000Genomes data with PLINK for both disorders. The number of SNPs in the datasets matches the number of SNPs in the LD matrices. I've added these matrices to my datasets using:

minimum_data1 <- as.list(dataset[c("beta","varbeta","p","SNP","POS")])
minimum_data1$type <- "cc"
minimum_data1$LD <- as.matrix(LD_data)

The data structure of my datasets looks just like the structure of the testdata.
Testdata:

str(D1$LD)
num [1:50, 1:50] 1 0.732 0.805 0.763 0.541 ...

  • attr(*, "dimnames")=List of 2
    ..$ : chr [1:50] "s1" "s2" "s3" "s4" ...
    ..$ : chr [1:50] "s1" "s2" "s3" "s4" ...
    My data:

str(minimum_data1$LD)
num [1:3397, 1:3397] 1 1 1 0.0895 0.0804 ...

  • attr(*, "dimnames")=List of 2
    ..$ : chr [1:3397] "1:115581654" "1:115581717" "1:115581758" "1:115581922" ...
    ..$ : chr [1:3397] "1:115581654" "1:115581717" "1:115581758" "1:115581922" ...

When I use the command:

check_dataset(minimum_data1, suffix = "", req = c("SNP", "LD"), warn.minp = 1e-06)
It returns:
NULL

However, when I try to run runsusie, I get the error: Error in (function (z, R, z_ld_weight = 0, L = 10, prior_variance = 50, : The dimension of correlation matrix (0 by 0) does not agree with expected (3397 by 3397). The matrix however in fact is a matrix of 3397x3397. What am I doing wrong?

Thank you for your time,
Sophie

Yes, the LD column and row names match the SNPs from the dataset exactly and both are ordered similarly:

str(minimum_data1)
List of 7
$ beta : num [1:3397] -0.0067 -0.0066 0.0067 0.002 0.0093 -0.0216 -0.003 0.0182 -0.0005 -0.0213 ...
$ varbeta: num [1:3397] 0.001884 0.001884 0.001884 0.000313 0.000335 ...
$ p : num [1:3397] 0.877 0.879 0.877 0.91 0.611 ...
$ SNP : chr [1:3397] "1:115581654" "1:115581717" "1:115581758" "1:115581922" ...
...
str(minimum_data1$LD)
num [1:3397, 1:3397] 1 1 1 0.0895 0.0804 ...

  • attr(*, "dimnames")=List of 2
    ..$ : chr [1:3397] "1:115581654" "1:115581717" "1:115581758" "1:115581922" ...
    ..$ : chr [1:3397] "1:115581654" "1:115581717" "1:115581758" "1:115581922" ...

I can send them to you. What's your e-mail adress?

I suspect the issue is due to your input list naming the element "SNP" and not "snp", which causes the LD matrix to be subset to 0 rows and columns:

coloc/R/susie.R

Line 460 in f0c4a6a

LD=d$LD[d$snp,d$snp] # just in case

does that fix your issue @riesmeijersa ?

Yes it did

I ran into a similar issue. My PR #62 enables coloc to detect these data formatting issues early, and prevent passing an empty LD matrix tosusie_rss().

Two other, somewhat related questions:

@riesmeijersa: which command did you use in plink for this?
@chr1swallace: do we need to have r or r2?

For susie you should use --r for plink:

stephenslab/susieR#135 (comment)

thanks for replying @mhaiyue . yes, we need the correlation matrix, so r. Values should range between -1 and +1.