No phenotype left. Perhaps the FID/IID do not match?

Question

No phenotype left. Perhaps the FID/IID do not match?

hs2097 opened this issue 3 years ago · 0 comments

hs2097 commented 3 years ago

Hi!

I am new to working with polygenic risk scores (PRS) and Lassosum. I started working on PRS using the diabetes summary statistics file as my base data and my own samples as test data. When I run the pipeline, I constantly get an error. I would really appreciate it if you could help me out with it.

Here's the code that I've used:

library(lassosum)
# Prefer to work with data.table as it speeds up file reading
library(data.table)
library(methods)
library(magrittr)
# For multi-threading, you can use the parallel package and 
# invoke cl which is then passed to lassosum.pipeline
library(parallel)
# This will invoke 2 threads. 
cl <- makeCluster(2)
sum.stat <- “d1new.QC.gz"
bfile <- “report.QC"
# Read in and process the covariates
covariate <- fread(“report.cov")
pcs <- fread(“report.eigenvec") 
    setnames(., colnames(.), c("FID","IID", paste0("PC",1:6)))
# Need as.data.frame here as lassosum doesn't handle data.table 
# covariates very well
cov <- merge(covariate, pcs, by = c(“IID”)
# We will need the EUR.hg19 file provided by lassosum 
# which are LD regions defined in Berisa and Pickrell (2015) for the European population and the hg19 genome.
ld.file <- "EUR.hg19"
# output prefix
prefix <- “report”
# Read in the target phenotype file
target.pheno <- fread(“report1.diabetes)[,c(“FID", "IID", “Diabetes”)]
# Read in the summary statistics
ss <- fread(sum.stat)
#Change names of the columns
names(ss)[names(ss) == "P-val"] <- "P"
names(ss)[names(ss) == "Position_hg19"] <- "BP"
names(ss)[names(ss) == "SNP"] <- "summary"
names(ss)[names(ss) == "rsid"] <- "SNP"
names(ss)[names(ss) == "Effect-allele"] <- "A1"
names(ss)[names(ss) == "Other-allele"] <- "A2"
names(ss)[names(ss) == "Other-allele-frequency"] <- "MAF"
names(ss)[names(ss) == "Other-allele-frequency-cases"] <- "MAF-cases"
names(ss)[names(ss) == "Sample-size-cases"] <- "N-cases"
names(ss)[names(ss) == "Sample-size"] <- "N"
# Remove P-value = 0, which causes problem in the transformation
ss <- ss[!P == 0]
#Remove Chromosomes>23
 df <- subset(ss, Chr<23)
# Transform the P-values into correlation
cor <- p2cor(p = df$P,
        n = df$N,
        sign = log(df$OR)
        )
fam <- fread(paste0(bfile, ".fam"))
fam[,ID:=do.call(paste, c(.SD, sep=":")),.SDcols=c(1:2)]
# Run the lassosum pipeline
# The cluster parameter is used for multi-threading
# You can ignore that if you do not wish to perform multi-threaded processing
out <- lassosum.pipeline(
    cor = cor,
    chr = df$CHR,
    pos = df$BP,
    A1 = df$A1,
    A2 = df$A2,
    ref.bfile = bfile,
    test.bfile = bfile,
    LDblocks = ld.file, 
    cluster=cl
)
# Store the R2 results
target.res <- validate(out, pheno = as.data.frame(target.pheno), covar=as.data.frame(cov))
# Get the maximum R2
r2 <- max(target.res$validation.table$value)^2

Here's the error that I'm getting:

0 out of 49 samples kept in pheno.
Error in parse.pheno.covar(pheno = pheno, covar = covar, parsed = parsed.test,  : 
  No phenotype left. Perhaps the FID/IID do not match?

I have used the code given in [https://choishingwan.github.io/PRS-Tutorial/lassosum/]

Thanks in advance!!

Harshita