lrgr/tcsm

Lack an example for train.feature.file and test.feature.file

Opened this issue · 5 comments

Dear @wir963 ,

Now I know the usage of tcsm. If covariate is NULL, it's much easier. If covariate is provided, I also need to provide the feature file and covariate file for the train and test tumors.

run.tcsm <- function(mutation.count.file, feature.file, covariates, K, seed, exposure.output.file, signature.output.file, effect.output.file, sigma.output.file, gamma.output.file)

run.stm <- function(train.mutation.count.file, test.mutation.count.file, train.feature.file, test.feature.file, covariates, K, seed, heldout.performance.file)

Now the question is, in your demo folder, I cannot find the train.feature.file and test.feature.file. Can you show me what does the feature file look like? Thanks!

Hey @WuyangFF95 ,

It's been a while since I've looked at the repo but I think demo/data/TCGA-BRCA_HRd_covariate.tsv is what you're looking for? Then you can just use sklearn to split that into test and train in a stratified (so that you can an approximately equal number of samples from both HR status). Let me know if that makes sense

Best, Welles

I see. I can use sklearn to split the TCGA-BRCA_HRd_covariate.tsv into two files, one for test dataset, one for training dataset.

Another question is that: there are two parameters in your run.tcsm() function, feature.file and covariates. If TCGA-BRCA_HRd_covariate.tsv is assigned to feature.file, what value should I assign to covariates?

feature.file is the file containing the values for the features. covariates are the covariates that you are interested from the feature.file. For example, you can have lots of features in feature.file but we would only try to model the features specified by covariates. For the demo, use covariates="HRd"

Have you looked at demo/Snakefile? This file should answer many of your questions

I looked at the snakefile, however the snakefile is unfriendly for users like me which have no experience in snakemake.

I would suggest you can wrap up your code to an R package using R studio, and users can easily install the package and retrieve documentations and vignettes in the package.