getian107/PRScs

Parallelization across chromosomes with PRScs-auto

Closed this issue · 2 comments

Hi Tian,

first of all, thanks for the great tool!

When running PRScs-auto for each chromosome separately (to enable parallel computation, as recommended in your README), I noticed that the estimated phi values differ between chromosomes. While this is probably the expected behavior, I am wondering how this might impact the performance of the resulting PRS. Especially, since you had previously stated that choosing phi per chromosome has not been tested.

When using PRScs-auto, would it be better to not parallelize across chromosomes? Or has it been tested since then how it would affect the resulting PRS performance?

Looking forward to your reply.

Best,
Friederike

Hi Friederike-- PRScs(-auto) always fits the data from each chromosome separately so for the auto version you will get different estimated phi values between chromosomes. This is expected because the genetic architecture between chromosomes may be different (e.g., strong association signals some of the chromosomes but not others). For the grid search approach, you can also optimize phi for each chromosome separately but we haven't benchmarked the improvement relative to using the same phi across chromosomes (which may be small) and optimizing the model over a larger search space also adds some complexity to the computation and may increase the risk of overfitting. So right now the common practice for the grid search is to use a common phi across chromosomes.

Perfect, thank you!