pjgreer/ukb-rap-tools

Link PRS to Phenotype

Closed this issue · 4 comments

Hi Phill,

Thank you so much for your amazing work, this tutorial is really helpful! I have a question regards to the cal-PRS, after generating the PRS, I've noticed that the IID and FID column are all -1, -2, -3... due to the .sample file from imputed data are all in this pseudo ID format. Do you have any idea on how we can map the IID from PRS to the patient ID in phenotype data?

Thank you so much for your insight and help!!

When everything is working correctly the .sscore file should have the subjects eid encoded as IID and FID.

If you are getting the -1, -2, etc for all IID and FID, I would reccomend checing to see if you are pulling in the correct .sample file. You can check that by looking at the top 10 lines of the .sample file with the command below.
dx head /Bulk/Imputation/UKB\ imputation\ from\ genotype/ukb22828_c1_b0_v3.sample

This file is used in the second PRS script 02-comb-snps-imp37.sh at line 65 when it converts the bgen file to plink format. The plink output of that command should have the correct IID and FID in the .psam file. You can check that by looking at the top 10 lines of the .psam file with the command below. (you may have to change the path if you used different names than I did in the README.

dx head /data/imp37_prsfiles/ukb-select-all.psam

Up until that point, the scripts do not care about the subject IDs. Let me know if this fixes you problem.

Hey Phill,

When everything is working correctly the .sscore file should have the subjects eid encoded as IID and FID.

If you are getting the -1, -2, etc for all IID and FID, I would reccomend checing to see if you are pulling in the correct .sample file. You can check that by looking at the top 10 lines of the .sample file with the command below. dx head /Bulk/Imputation/UKB\ imputation\ from\ genotype/ukb22828_c1_b0_v3.sample

This file is used in the second PRS script 02-comb-snps-imp37.sh at line 65 when it converts the bgen file to plink format. The plink output of that command should have the correct IID and FID in the .psam file. You can check that by looking at the top 10 lines of the .psam file with the command below. (you may have to change the path if you used different names than I did in the README.

dx head /data/imp37_prsfiles/ukb-select-all.psam

Up until that point, the scripts do not care about the subject IDs. Let me know if this fixes you problem.

Thanks for your explanation! When I check the ukb22828_c1_b0_v3.sample, its just have -1, -2, and for the rest of all the subjects... Is that how it looked like from your end? So does the ukb-select-all.psam file looked like after I convert the format from bgen to plink2 trios using the same command as you did to get the .psam. Maybe I should re-dispense the bulk files?

Both the .sample and the .psam file generated at line 65 have my projects 7 digit eids for the UKB subjects. (The hash symbols are masking my subject identifiers, but you understand what it is suppose to look like)

ID_1 ID_2 missing sex
0 0 0 D
470#### 470#### 0 1
318#### 318#### 0 2
365#### 365#### 0 2
399#### 399#### 0 2

I would definitely recommend re-dispensing the bulk files and if that doesn't fix it open up a ticket with DNANexus. This worked on my dispensed data from Feb 20 2024 and my project last updated in Oct 2023.

-Phil

Both the .sample and the .psam file generated at line 65 have my projects 7 digit eids for the UKB subjects. (The hash symbols are masking my subject identifiers, but you understand what it is suppose to look like)

ID_1 ID_2 missing sex
0 0 0 D
470#### 470#### 0 1
318#### 318#### 0 2
365#### 365#### 0 2
399#### 399#### 0 2

I would definitely recommend re-dispensing the bulk files and if that doesn't fix it open up a ticket with DNANexus. This worked on my dispensed data from Feb 20 2024 and my project last updated in Oct 2023.

-Phil

Thank you so much for sharing your .sample and .psam file, I will definitely check it after re-dispensing!! Thanks again for all the help:)