Too many variants with mismatch information
sakuramodokich opened this issue · 4 comments
sakuramodokich commented
Hi,
I found that the SNPs in the genome file have been excessively filtered:
1837501 variant(s) not found in previous data
4482871 variant(s) with mismatch information
37522 variant(s) included
My output:
PRSice 2.3.5 (2021-09-20)
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2023-07-06 04:36:09
./PRSice_linux \
--a1 Allele1 \
--a2 Allele2 \
--bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
--base Roselli_2018_AF_HRC_GWAS_EURv11.txt \
--beta \
--binary-target T \
--bp pos \
--chr chr \
--clump-kb 250kb \
--clump-p 1.000000 \
--clump-r2 0.100000 \
--extract UKB_imputed.valid \
--ignore-fid \
--interval 5e-05 \
--keep af_df_sample_ID.txt \
--lower 5e-08 \
--num-auto 22 \
--out UKB_imputed \
--pheno af_df.phe \
--pheno-col af_cc \
--pvalue P-value \
--seed 928429407 \
--snp MarkerName \
--stat Effect \
--target ukb21008_c#_qc_pass \
--thread 36 \
--upper 0.5
Initializing Genotype file: ukb21008_c#_qc_pass (bed)
Start processing Roselli_2018_AF_HRC_GWAS_EURv11
==================================================
SNP extraction/exclusion list contains 5 columns, will
assume first column contains the SNP ID
Base file: Roselli_2018_AF_HRC_GWAS_EURv11.txt
Header of file is:
MarkerName Allele1 Allele2 chr pos Effect StdErr P-value
9362422 variant(s) observed in base file, with:
1424010 variant(s) excluded based on user input
7938412 total variant(s) included from base file
Loading Genotype info from target
==================================================
488315 people (223502 male(s), 264624 female(s)) observed
337053 founder(s) included
1837501 variant(s) not found in previous data
4482871 variant(s) with mismatch information
37522 variant(s) included
Phenotype file: af_df.phe
Column Name of Sample ID: FID
Note: If the phenotype file does not contain a header, the
column name will be displayed as the Sample ID which is
expected.
There are a total of 1 phenotype to process
Start performing clumping
Number of variant(s) after clumping : 3813
Processing the 1 th phenotype
af_cc is a binary phenotype
28063 control(s)
308990 case(s)
There are 1 region(s) with p-value less than 1e-5. Please
note that these results are inflated due to the overfitting
inherent in finding the best-fit PRS (but it's still best
to find the best-fit PRS!).
You can use the --perm option (see manual) to calculate an
empirical P-value.
choishingwan commented
Different genome builds
…On Thu, Jul 6, 2023, 5:22 AM sakuramodoki ***@***.***> wrote:
Assigned #328 <#328> to
@choishingwan <https://github.com/choishingwan>.
—
Reply to this email directly, view it on GitHub
<#328 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTRYRYF7E3NWTZ7NSU5XDXOZ7W5ANCNFSM6AAAAAA2AEVUFA>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
sakuramodokich commented
Both files are GRCh37-based
choishingwan commented
You can also check the mismatch file generate by PRSice to see what
mismatch was reported
…On Thu, Jul 6, 2023, 8:37 AM sakuramodoki ***@***.***> wrote:
Both files are GRCh37-based
—
Reply to this email directly, view it on GitHub
<#328 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTRYSY25IRHJNSMPBL2G3XO2WPBANCNFSM6AAAAAA2AEVUFA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
sakuramodokich commented
I get, thank you!