Ivis4ml/fssemR

05_DataprocLungCancer.R fail to reproduce SNPs imputation with synbreed

Closed this issue · 3 comments

Hi, i was trying to reproduce the data of the paper but i get stuck at the imputation step (which is the 3 step of the file "05_DataprocLungCancer.R"). I would appreciate if you could help me fix the issue that i found so that i can test the method and later use it with my data.

The specific part of the code is the following

###remove unchanged SNP and all Missing NA
###impute missing NA in SNP matrix
SNPvarmat = t(SNPvarmat)
SNPmap = SNPmap[colnames(SNPvarmat),c(2,3)]
colnames(SNPmap) = c("chr", "pos")
SNPmap[,2] = as.numeric(SNPmap[,2])
##dim(SNPvarmat) ## [1] 122 930002
PData2 = phenoData(gse2$eset) # SNP
SNPPheno = PData2@data[rownames(SNPvarmat), c(10, 11)]
SNPPheno[,1] = as.numeric(SNPPheno[,1])
SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2])
colnames(SNPPheno) = c("Gender", "Status")
SNPData = create.gpData(pheno = SNPPheno, geno = SNPvarmat, map = SNPmap, map.unit = "bp") <-- PROBLEM HERE
SNPImputed = codeGeno(SNPData, impute=TRUE, impute.type="beagle", cores = 4) <-- CRASHES HERE
SNPvarmat = t(SNPImputed$geno)

I think the issue is that

SNPPheno[,1] = as.numeric(SNPPheno[,1])
SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2])

are incorrect because SNPPheno have 2 columns that are the gender (which is only female) and the status (which is normal/tumor) are strings and the conversion as.numeric leads to R just filling the columns with NAs while i think synbreed wants the phenotype and not a data.frame with all NAs, as per create.gpData(pheno = SNPPheno,

The data to create SNPPheno are taken from the GEOdatabase file "GSE33356-GPL6801_series_matrix.txt.gz" just as it's already written on the code.
The values inside of SNPPheno after SNPPheno = PData2@data[rownames(SNPvarmat), c(10, 11)]

head(SNPPheno)
characteristics_ch1 characteristics_ch1.1
GSM824988 gender: female tissue: normal lung tissue
GSM824989 gender: female tissue: cancer lung tissue
GSM824990 gender: female tissue: normal lung tissue
GSM824991 gender: female tissue: cancer lung tissue
GSM824992 gender: female tissue: normal lung tissue
GSM824993 gender: female tissue: cancer lung tissue

So when i do

SNPPheno[,1] = as.numeric(SNPPheno[,1])
SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2])

I get a warning message

SNPPheno[,1] = as.numeric(SNPPheno[,1])
Warning message:
NAs introduced by coercion
SNPPheno[,2] = 2 - as.numeric(SNPPheno[,2])
Warning message:
NAs introduced by coercion

head(SNPPheno)
characteristics_ch1 characteristics_ch1.1
GSM824988 NA NA
GSM824989 NA NA
GSM824990 NA NA
GSM824991 NA NA
GSM824992 NA NA
GSM824993 NA NA

I hope for a reply, thank you.

Thank you but it still doesn't work and crashes during the imputation step
SNPImputed = codeGeno(SNPData, impute=TRUE, impute.type="beagle", cores = 4)

i get this error:

SNPvarmat = t(SNPImputed$geno)Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
ignoring SIGPIPE signal
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
ignoring SIGPIPE signal
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
ignoring SIGPIPE signal

I have a i7 with 16gb of ram, i don't know if it's a ram problem.

If it's not too much disturb can i contact you at xxz220@miami.edu for a few questions about this program? I'm not posting it here because they are not real issues, more about clarifications in this regard.
Thank you.