rgcgithub/regenie

Does REGENIE step 1 require independent SNPs?

cs16436 opened this issue · 2 comments

Hi,

Please may I check if REGENIE step 1 requires independent SNPs (does the genotype data need to be pruned during cleaning prior to running step 1)?

Thank you in advance!

In theory since REGENIE relies on Ridge regression, it can handle variants in LD. However, that would cause unnecessary computational time. So, it's recommended to LD prune your variants before feeding them to REGENIE step 1.

From REGENIE paper:

a minor allele frequency of ≥1%, a Hardy–Weinberg equilibrium test not exceeding P = 1 × 10−15, a genotyping rate above 99%, not present in low-complexity regions, not involved in inter-chromosomal LD and LD pruning using a R2 threshold of 0.9 with a window size of 1,000 markers and a step size of 100 markers. This resulted in up to 471,762 genotyped SNPs that were kept in the analyses

Also here:

How many variants to use in step 1?
We recommend to use a smaller set of about 500K directly genotyped SNPs in step 1, which should be sufficient to capture genome-wide polygenic effects. Note that using too many SNPs in Step 1 (e.g. >1M) can lead to a high computational burden due to the resulting higher number of predictors in the level 1 models.

Hope this helps
Oveis

This is very helpful - thank you so much, Oveis.