Lack an example for train.feature.file and test.feature.file
Opened this issue · 5 comments
Dear @wir963 ,
Now I know the usage of tcsm. If covariate is NULL, it's much easier. If covariate is provided, I also need to provide the feature file and covariate file for the train and test tumors.
run.tcsm <- function(mutation.count.file, feature.file, covariates, K, seed, exposure.output.file, signature.output.file, effect.output.file, sigma.output.file, gamma.output.file)
run.stm <- function(train.mutation.count.file, test.mutation.count.file, train.feature.file, test.feature.file, covariates, K, seed, heldout.performance.file)
Now the question is, in your demo folder, I cannot find the train.feature.file and test.feature.file. Can you show me what does the feature file look like? Thanks!
Hey @WuyangFF95 ,
It's been a while since I've looked at the repo but I think demo/data/TCGA-BRCA_HRd_covariate.tsv
is what you're looking for? Then you can just use sklearn
to split that into test and train in a stratified (so that you can an approximately equal number of samples from both HR status). Let me know if that makes sense
Best, Welles
I see. I can use sklearn to split the TCGA-BRCA_HRd_covariate.tsv into two files, one for test dataset, one for training dataset.
Another question is that: there are two parameters in your run.tcsm()
function, feature.file
and covariates
. If TCGA-BRCA_HRd_covariate.tsv is assigned to feature.file
, what value should I assign to covariates
?
feature.file
is the file containing the values for the features. covariates
are the covariates that you are interested from the feature.file
. For example, you can have lots of features in feature.file
but we would only try to model the features specified by covariates
. For the demo, use covariates="HRd"
Have you looked at demo/Snakefile
? This file should answer many of your questions
I looked at the snakefile, however the snakefile is unfriendly for users like me which have no experience in snakemake.
I would suggest you can wrap up your code to an R package using R studio, and users can easily install the package and retrieve documentations and vignettes in the package.