No phenotype left. Perhaps the FID/IID do not match?
hs2097 opened this issue · 0 comments
hs2097 commented
Hi!
I am new to working with polygenic risk scores (PRS) and Lassosum. I started working on PRS using the diabetes summary statistics file as my base data and my own samples as test data. When I run the pipeline, I constantly get an error. I would really appreciate it if you could help me out with it.
Here's the code that I've used:
library(lassosum)
# Prefer to work with data.table as it speeds up file reading
library(data.table)
library(methods)
library(magrittr)
# For multi-threading, you can use the parallel package and
# invoke cl which is then passed to lassosum.pipeline
library(parallel)
# This will invoke 2 threads.
cl <- makeCluster(2)
sum.stat <- “d1new.QC.gz"
bfile <- “report.QC"
# Read in and process the covariates
covariate <- fread(“report.cov")
pcs <- fread(“report.eigenvec")
setnames(., colnames(.), c("FID","IID", paste0("PC",1:6)))
# Need as.data.frame here as lassosum doesn't handle data.table
# covariates very well
cov <- merge(covariate, pcs, by = c(“IID”)
# We will need the EUR.hg19 file provided by lassosum
# which are LD regions defined in Berisa and Pickrell (2015) for the European population and the hg19 genome.
ld.file <- "EUR.hg19"
# output prefix
prefix <- “report”
# Read in the target phenotype file
target.pheno <- fread(“report1.diabetes)[,c(“FID", "IID", “Diabetes”)]
# Read in the summary statistics
ss <- fread(sum.stat)
#Change names of the columns
names(ss)[names(ss) == "P-val"] <- "P"
names(ss)[names(ss) == "Position_hg19"] <- "BP"
names(ss)[names(ss) == "SNP"] <- "summary"
names(ss)[names(ss) == "rsid"] <- "SNP"
names(ss)[names(ss) == "Effect-allele"] <- "A1"
names(ss)[names(ss) == "Other-allele"] <- "A2"
names(ss)[names(ss) == "Other-allele-frequency"] <- "MAF"
names(ss)[names(ss) == "Other-allele-frequency-cases"] <- "MAF-cases"
names(ss)[names(ss) == "Sample-size-cases"] <- "N-cases"
names(ss)[names(ss) == "Sample-size"] <- "N"
# Remove P-value = 0, which causes problem in the transformation
ss <- ss[!P == 0]
#Remove Chromosomes>23
df <- subset(ss, Chr<23)
# Transform the P-values into correlation
cor <- p2cor(p = df$P,
n = df$N,
sign = log(df$OR)
)
fam <- fread(paste0(bfile, ".fam"))
fam[,ID:=do.call(paste, c(.SD, sep=":")),.SDcols=c(1:2)]
# Run the lassosum pipeline
# The cluster parameter is used for multi-threading
# You can ignore that if you do not wish to perform multi-threaded processing
out <- lassosum.pipeline(
cor = cor,
chr = df$CHR,
pos = df$BP,
A1 = df$A1,
A2 = df$A2,
ref.bfile = bfile,
test.bfile = bfile,
LDblocks = ld.file,
cluster=cl
)
# Store the R2 results
target.res <- validate(out, pheno = as.data.frame(target.pheno), covar=as.data.frame(cov))
# Get the maximum R2
r2 <- max(target.res$validation.table$value)^2
Here's the error that I'm getting:
0 out of 49 samples kept in pheno.
Error in parse.pheno.covar(pheno = pheno, covar = covar, parsed = parsed.test, :
No phenotype left. Perhaps the FID/IID do not match?
I have used the code given in [https://choishingwan.github.io/PRS-Tutorial/lassosum/]
Thanks in advance!!
Harshita