mikelove/mrlocus

flipAllelesAndGather

Closed this issue · 7 comments

I have the below columns in my eqtl and gwas data frames
eqtl: "snpid","ref_eqtl", "effect_eqtl", "beta_eqtl", "se_eqtl"
gwas: "snpid","ref_gwas", "effect_gwas", "beta_gwas", "se_gwas"

collapseHighCorSNPs ran perfectly.
But flipAllelesAndGather returned an error suggesting "Column 2 ['ref_gwas'] of item 2 is missing in item 1."

Can you help with this? Thank you.
See below.


data2 <- flipAllelesAndGather(data1$sum_stat, data1$ld_mat,
                             a="eqtl", b="gwas",
                              ref="ref", eff="effect",
                              beta="beta", se="se",
                             snp_id="snpid", sep="_",                              
                              ab_last=TRUE)

Error in rbindlist(l, use.names, fill, idcol) :
  Column 2 ['ref_gwas'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names.
>

Can you show:

head(data1$sum_stat[[1]])

> head(data1$sum_stat[[1]])
         snpid ref_eqtl effect_eqtl  beta_eqtl   se_eqtl
1:  rs57623299        C           A -0.0353702 0.0290649
2:  rs67488203        C           T -0.0264509 0.0279242
3: rs111339828        G           A -0.0251661 0.0813172
4:  rs74696018        A           G -0.0842433 0.0713725
5:   rs6446927        G           A -0.0406331 0.0302949
6:   rs7670052        C           T -0.0486308 0.0647667
                                                                               collapsed
1:                                                                 rs57623299,rs67122812
2:                                                                            rs67488203
3:                                                               rs111339828,rs143392427
4:                                                                 rs74696018,rs75606860
5: rs6446927,rs66600903,rs7670092,rs7678298,rs12651581,rs113712605,rs67784283,rs56247162
6:                                                                  rs7670052,rs72863600

So there is no ref_gwas or effect_gwas in this table, but it is expected for the next function. Did you provide these columns in the earlier input?

I have two datafames and the ldmat below as input
> head(eqtl.df4MR)
         snpid ref_eqtl effect_eqtl  beta_eqtl   se_eqtl
1:  rs57623299        C           A -0.0353702 0.0290649
2:  rs67488203        C           T -0.0264509 0.0279242
3:  rs67122812        A           G -0.0353702 0.0290649
4: rs111339828        G           A -0.0251661 0.0813172
5:  rs74696018        A           G -0.0842433 0.0713725
6:   rs6446927        G           A -0.0406331 0.0302949
> head(outc.df4MR)
         snpid ref_gwas effect_gwas    beta_gwas     se_gwas
1:  rs57623299        C           A -0.004114950 0.003307838
2:  rs67488203        C           T -0.002196056 0.003182689
3:  rs67122812        G           A  0.004028121 0.003307160
4: rs111339828        G           A  0.004223786 0.009845656
5:  rs74696018        G           A  0.001041202 0.008604978
6:   rs6446927        G           A -0.005774544 0.003501846

> ldmat[1:5,1:5]
                rs57623299_A_C rs67488203_T_C rs67122812_G_A rs111339828_A_G
rs57623299_A_C        1.000000       0.939359       1.000000        0.218782
rs67488203_T_C        0.939359       1.000000       0.939359        0.217166
rs67122812_G_A        1.000000       0.939359       1.000000        0.218782
rs111339828_A_G       0.218782       0.217166       0.218782        1.000000
rs74696018_G_A        0.311936       0.299692       0.311936        0.778853
                rs74696018_G_A
rs57623299_A_C        0.311936
rs67488203_T_C        0.299692
rs67122812_G_A        0.311936
rs111339828_A_G       0.778853
rs74696018_G_A        1.000000

I then used the below command to create data1
list.sumstat=list(eqtl.df4MR,outc.df4MR)
list.ldmat=list(ldmat,ldmat)

length(list.sumstat)
length(list.ldmat)

data1 <- collapseHighCorSNPs(sum_stat=list.sumstat, ld_mat=list.ldmat,
                             threshold = 0.95,
                             score = NULL,
                             plot = F,
                             snp_id = TRUE)

I ended up with the eqtl and gwas in data1 in two separate dataframe
> head(data1$sum_stat[[1]])
         snpid ref_eqtl effect_eqtl  beta_eqtl   se_eqtl
1:  rs57623299        C           A -0.0353702 0.0290649
2:  rs67488203        C           T -0.0264509 0.0279242
3: rs111339828        G           A -0.0251661 0.0813172
4:  rs74696018        A           G -0.0842433 0.0713725
5:   rs6446927        G           A -0.0406331 0.0302949
6:   rs7670052        C           T -0.0486308 0.0647667
                                                                               collapsed
1:                                                                 rs57623299,rs67122812
2:                                                                            rs67488203
3:                                                               rs111339828,rs143392427
4:                                                                 rs74696018,rs75606860
5: rs6446927,rs66600903,rs7670092,rs7678298,rs12651581,rs113712605,rs67784283,rs56247162
6:                                                                  rs7670052,rs72863600
> head(data1$sum_stat[[2]])
         snpid ref_gwas effect_gwas     beta_gwas     se_gwas
1:  rs57623299        C           A -4.114950e-03 0.003307838
2:  rs67488203        C           T -2.196056e-03 0.003182689
3: rs111339828        G           A  4.223786e-03 0.009845656
4:  rs74696018        G           A  1.041202e-03 0.008604978
5:   rs6446927        G           A -5.774544e-03 0.003501846
6:   rs7670052        C           T  2.476676e-05 0.008255586
                                                                               collapsed
1:                                                                 rs57623299,rs67122812
2:                                                                            rs67488203
3:                                                               rs111339828,rs143392427
4:                                                                 rs74696018,rs75606860
5: rs6446927,rs66600903,rs7670092,rs7678298,rs12651581,rs113712605,rs67784283,rs56247162
6:                                                                  rs7670052,rs72863600

Oh I see the problem. The input to these functions e.g. sum_stat should be a list of data.frames, where each element of the list is a "clump" or conditional signal. Each data.frame should have both eQTL and GWAS information, including ref and effect alleles, effect size, SE, etc. See the diagram here:

https://mikelove.github.io/mrlocus/#data-input

Let me know if I can clarify though.

Thank you for clarifying!

Sure let me know if you have any other issues or questions.