lrgr/tcsm

How to extract signatures and reconstruct mutation spectra?

Closed this issue · 9 comments

Hi @wir963 ,

I have a question on using tcsm: how to use tcsm to extract signatures and quantify exposures? I cannot find any user manual or helper message to guide me in doing these two tasks.

Thanks!

Hey @WuyangFF95 ,

I'm currently working on a demo, which is on #6. I think the demo directory on that branch should answer your question. I'd appreciate any feedback on the demo to make it more user-friendly.

Best, Welles

I cannot initiate the run_stm.R in Anaconda environment.

(tcsm) [wuyang@monster tcsm]$ Rscript src/run_stm.R -h
stm v1.3.3 (2018-1-26) successfully loaded. See ?stm for help.
Papers, resources, and other materials at structuraltopicmodel.com
Error: object 'snakemake' not found
Execution halted

Another question, can I change the seed number before running? Thanks!

@WuyangFF95 Are you on the demo branch? You will need to update the environment file to include argparse conda install -c conda-forge r-argparse but it seems like you're still on the master branch.

Yep, you can use whatever seed you want. The seed is for being able to reproduce our results from the paper

I managed to open the help message by switching to demo branch.
(tcsm) [wuyang@monster demo]$ Rscript ../src/run_stm.R -h

usage: ../src/run_stm.R [-h] [-m M] [-c C] [-e EXPOSURES]
[--signatures SIGNATURES] [--effect EFFECT]
[--sigma SIGMA] [--gamma GAMMA] [-s S]
[--covariates COVARIATES] [-k K]

optional arguments:
-h, --help show this help message and exit
-m M mutation count input file
-c C covariate input file
-e EXPOSURES, --exposures EXPOSURES
normalized exposure output file
--signatures SIGNATURES
exome signature output file
--effect EFFECT effect output file
--sigma SIGMA sigma output file
--gamma GAMMA gamma output file
-s S random seed
--covariates COVARIATES
covariates (separated by +)
-k K number of signatures to use

Here you stated "exome signature output file". So I just wonder is tcsm only good for analyzing exome mutation count file, rather than genome mutation count file?

Also, can you tell me what is the meaning of "covariates"?

@WuyangFF95

Sorry for the delay. Early on in this research project, we experimented with normalizing signatures using nucleotide opportunity so we differentiated between exome and genome signatures for that reason. We just used TCSM for exome signatures in this paper. However, there's no reason why TCSM wouldn't work for genome as well as exome signatures. I updated the code so it now reads signature output file.

"covariates" are factors that may influence the prior expected exposure for a signature, like biallelic inactivation of BRCA1/2 and SBS3. This is the main idea behind the method so I'd suggest to check out the paper (https://academic.oup.com/bioinformatics/article/35/14/i492/5529117) for a more thorough discussion of covariates

If covariate is not available, is it okay to omit it?

@WuyangFF95

Sorry for the delay. It's okay to omit the covariate (but it may have been a little complicated from the code base so I simplified that in the above PR). The key advantage of TCSM compared to other models is the use of covariates though

Great! I'll try to play with it with a specific K first.