GangLiTarheel/CUE

Issue with predict()

Opened this issue · 4 comments

Hello,

I am trying to impute from 450k to EPIC, but since we normally perform QC and then normalization, the 248K probes used to train the models are not available to me. Since the method implemented here uses the predict() function, I cannot run the models in my test data. Is there any way of fixing this or will I be unable to perform imputation if I don't have the 248K probes available?

Many thanks in advance!

Best wishes,

Marta

Hi Marta,

Thank you for your interest in our package!
Do you want to impute the whole blood or placenta samples? If not, I would not recommend you to use CUE.
How many probes do you have out of the 248K?
I am not sure if your problem is due to the implementation of the predict() function. It seems that you could not run the package because you don't have the required input. Please show me the error you have when you try to run predict() function, if available.

Best,
Gang

Hi Gang,

Thank you for the fast reply and yes, very interested in the method! I have whole blood data (from 450K). I have around 237K probes available out of the 248K. The output below, when I try to run the code:

randomForest 4.6-14 Type rfNews() to see new features/changes/bug fixes. [1] "Completeness checked! The input HM450 data don't have any missing NAs." Error in CUE_check(X) : Error: the input HM450 data does not contain all the required probes.! Please complete the dataset first! Execution halted

I get an eval(predvars, data, env) : object '....' not found when I run the predict() function, which I think happens because I don't have all the 248K probes available? This is an inherent property of predict().

Hi Marta,
I see. The current package will only run when you have the required 248K probes. I am working on adding a function with a partial input, which can impute only partial HM850-specific probes. For those HM850 probes that we don't have the required HM450 input, we won't be able to impute them using the current CUE. But with 237K, I believe we should be able to impute the majority of 339K. I will keep you posted once the function is complete.
Best,
Gang

Hi Gang,

Thank you very much for your reply and availability. I think this also makes the package more widely applicable, since the reality is people might not always have the full set of 248K probes available. Looking forward to the new implementation! Great job.

Best wishes,

Marta