bulik/ldsc

Sample size with repeated observations

Opened this issue · 1 comments

I want to compute LDSC heritability and genetic correlation of a genome-wide meta-analysis including repeated observations and I was wondering what value for sample size I should use.

For example, for some studies, the same individuals were included at age 20 and age 25. Here an example for one SNP to illustrate:

Study Phenotype Age N
Study_1 Phenotype 20 1000
Study_1 Phenotype 25 900
Study_2 Phenotype 24 2000
Study_3 Phenotype 24 3000
Study_4 Phenotype 25 2000
Study_5 Phenotype 20 2500
Study_5 Phenotype 23 2500

What sample size should I use?

  • Should I use the N for the independent individuals? 1000+2000+3000+2000+2500 = 10500
  • Or the N of observations? sum(N) = 13900

Note that in this example the difference in N is not so big (3000), but in my real dataset the differences are substantial e.g. Nind= 60,000 vs Nobs=400,000.

Thanks in advance,
L

@ldehoyos I think the only correct way to perform the analysis is to perform GWAS of phenotype at each age, avoiding "repeated observations". Then, you should compute heritability/genetic correlation of each phenotype/age combination.