flipAllelesAndGather
Closed this issue · 7 comments
I have the below columns in my eqtl and gwas data frames
eqtl: "snpid","ref_eqtl", "effect_eqtl", "beta_eqtl", "se_eqtl"
gwas: "snpid","ref_gwas", "effect_gwas", "beta_gwas", "se_gwas"
collapseHighCorSNPs ran perfectly.
But flipAllelesAndGather returned an error suggesting "Column 2 ['ref_gwas'] of item 2 is missing in item 1."
Can you help with this? Thank you.
See below.
data2 <- flipAllelesAndGather(data1$sum_stat, data1$ld_mat,
a="eqtl", b="gwas",
ref="ref", eff="effect",
beta="beta", se="se",
snp_id="snpid", sep="_",
ab_last=TRUE)
Error in rbindlist(l, use.names, fill, idcol) :
Column 2 ['ref_gwas'] of item 2 is missing in item 1. Use fill=TRUE to fill with NA (NULL for list columns), or use.names=FALSE to ignore column names.
>
Can you show:
head(data1$sum_stat[[1]])
> head(data1$sum_stat[[1]])
snpid ref_eqtl effect_eqtl beta_eqtl se_eqtl
1: rs57623299 C A -0.0353702 0.0290649
2: rs67488203 C T -0.0264509 0.0279242
3: rs111339828 G A -0.0251661 0.0813172
4: rs74696018 A G -0.0842433 0.0713725
5: rs6446927 G A -0.0406331 0.0302949
6: rs7670052 C T -0.0486308 0.0647667
collapsed
1: rs57623299,rs67122812
2: rs67488203
3: rs111339828,rs143392427
4: rs74696018,rs75606860
5: rs6446927,rs66600903,rs7670092,rs7678298,rs12651581,rs113712605,rs67784283,rs56247162
6: rs7670052,rs72863600
So there is no ref_gwas
or effect_gwas
in this table, but it is expected for the next function. Did you provide these columns in the earlier input?
I have two datafames and the ldmat below as input
> head(eqtl.df4MR)
snpid ref_eqtl effect_eqtl beta_eqtl se_eqtl
1: rs57623299 C A -0.0353702 0.0290649
2: rs67488203 C T -0.0264509 0.0279242
3: rs67122812 A G -0.0353702 0.0290649
4: rs111339828 G A -0.0251661 0.0813172
5: rs74696018 A G -0.0842433 0.0713725
6: rs6446927 G A -0.0406331 0.0302949
> head(outc.df4MR)
snpid ref_gwas effect_gwas beta_gwas se_gwas
1: rs57623299 C A -0.004114950 0.003307838
2: rs67488203 C T -0.002196056 0.003182689
3: rs67122812 G A 0.004028121 0.003307160
4: rs111339828 G A 0.004223786 0.009845656
5: rs74696018 G A 0.001041202 0.008604978
6: rs6446927 G A -0.005774544 0.003501846
> ldmat[1:5,1:5]
rs57623299_A_C rs67488203_T_C rs67122812_G_A rs111339828_A_G
rs57623299_A_C 1.000000 0.939359 1.000000 0.218782
rs67488203_T_C 0.939359 1.000000 0.939359 0.217166
rs67122812_G_A 1.000000 0.939359 1.000000 0.218782
rs111339828_A_G 0.218782 0.217166 0.218782 1.000000
rs74696018_G_A 0.311936 0.299692 0.311936 0.778853
rs74696018_G_A
rs57623299_A_C 0.311936
rs67488203_T_C 0.299692
rs67122812_G_A 0.311936
rs111339828_A_G 0.778853
rs74696018_G_A 1.000000
I then used the below command to create data1
list.sumstat=list(eqtl.df4MR,outc.df4MR)
list.ldmat=list(ldmat,ldmat)
length(list.sumstat)
length(list.ldmat)
data1 <- collapseHighCorSNPs(sum_stat=list.sumstat, ld_mat=list.ldmat,
threshold = 0.95,
score = NULL,
plot = F,
snp_id = TRUE)
I ended up with the eqtl and gwas in data1 in two separate dataframe
> head(data1$sum_stat[[1]])
snpid ref_eqtl effect_eqtl beta_eqtl se_eqtl
1: rs57623299 C A -0.0353702 0.0290649
2: rs67488203 C T -0.0264509 0.0279242
3: rs111339828 G A -0.0251661 0.0813172
4: rs74696018 A G -0.0842433 0.0713725
5: rs6446927 G A -0.0406331 0.0302949
6: rs7670052 C T -0.0486308 0.0647667
collapsed
1: rs57623299,rs67122812
2: rs67488203
3: rs111339828,rs143392427
4: rs74696018,rs75606860
5: rs6446927,rs66600903,rs7670092,rs7678298,rs12651581,rs113712605,rs67784283,rs56247162
6: rs7670052,rs72863600
> head(data1$sum_stat[[2]])
snpid ref_gwas effect_gwas beta_gwas se_gwas
1: rs57623299 C A -4.114950e-03 0.003307838
2: rs67488203 C T -2.196056e-03 0.003182689
3: rs111339828 G A 4.223786e-03 0.009845656
4: rs74696018 G A 1.041202e-03 0.008604978
5: rs6446927 G A -5.774544e-03 0.003501846
6: rs7670052 C T 2.476676e-05 0.008255586
collapsed
1: rs57623299,rs67122812
2: rs67488203
3: rs111339828,rs143392427
4: rs74696018,rs75606860
5: rs6446927,rs66600903,rs7670092,rs7678298,rs12651581,rs113712605,rs67784283,rs56247162
6: rs7670052,rs72863600
Oh I see the problem. The input to these functions e.g. sum_stat
should be a list of data.frames, where each element of the list is a "clump" or conditional signal. Each data.frame should have both eQTL and GWAS information, including ref and effect alleles, effect size, SE, etc. See the diagram here:
https://mikelove.github.io/mrlocus/#data-input
Let me know if I can clarify though.
Thank you for clarifying!
Sure let me know if you have any other issues or questions.