Run mutational signature analysis software packages Packages and benchmarking the performance of these packages.
A package to 1. wrap R-based signature analysis packages in functions handy for non-expert users, by wrapping default argument values and all necessary steps in the function bodies. 2. reproduce benchmarking analysis of signature analysis packages in papers by Rozen Lab.
Typically, a benchmarking analysis to evaluation accuracy of signature extraction and/or exposure inference involves the 3 steps below:
-
Generation of synthetic tumor spectra based on signatures and synthetic tumor exposures using wrapper functions in
SynSigGen
. Usually,- signatures are real signatures downloaded from COSMIC,
- synthetic tumor exposures are drawn from a distribution which mimics the distribution of a real tumor type.
-
Run of computational approaches (can be an R/Python/Julia/C++ package) on generated data sets. It involves two steps:
- Number of signatures (K) to be extracted is estimated
- heuristically - starting from an user-provided or default number, without providing the range of possible K s.
- semi-automatically - selecting from a range of possible K s.
- manually - requiring users to choose the best K based on the diagnostic plot or table generated by the software.
- Extract a specific number of signatures, AND/OR infer the exposures of extracted signatures
For computational approaches based on R and can do signature extraction which heuristically or semi-automatically selects K AND/OR exposure inference (attribution), we wrote wrapper functions in
R/
folder of this package for non-expert users to run these approaches with a simple function call. - Number of signatures (K) to be extracted is estimated
-
Evaluation of accuracy on signature extraction AND/OR exposure inference. Many of the evaluation functions are in package
SynSigEval
.
Install the development version of SynSigRun
from
GitHub with the R command
line:
install.packages("devtools")
devtools::install_github("WuyangFF95/SynSigRun", ref = "1.0.0-branch")
Nature paper “The repertoire of mutational signatures in human cancer”
(link) involves
benchmarking analysis compared to
SigProfiler
(the ancestor of
SigProfilerExtractor
)
and SignatureAnalyzer
.
It used some functions and top-level codes in this package. Some of the
codes are in data-raw/Alexandrov_2020
.
Scientific Reports paper “Accuracy of mutational signature software on correlated signatures” involves benchmarking signature extraction accuracy of 18 methods on 20 synthetic datasets with correlated exposures to SBS1 and SBS5 signature.
In order to reproduce this benchmarking, users can go to
data-raw/Wu_2022/1_scripts.for.SBS1SBS5
to generate the main figure
and the full data of this analysis. The sub-folders hold scripts for:
-
1_data_generation
- CallsSynSigGen
generation script to generate 20 SBS1-SBS5 datasets atdata-raw/
or other repositories. -
2_running_approaches
- running computational approaches directly or usingSynSigRun
wrapper functions. The results are generated as a 5-level folder structure:
Level 1: Datasets (e.g. S.0.1.Rsq.0.1
);
Level 2: De-novo extraction without specifying K = 2
(ExtrAttr
), or
extraction with number of ground-truth signature K = 2
provided to
computational approaches (ExtrAttrExact
);
Level 3: Results of computational approaches (e.g. hdp.results
);
Level 4: Results of runs with seeds (e.g. seed.1
, run.1
).
3_evaluation
- evaluating performance of signature extraction by calling evaluation functions inSynSigEval
.
The paper for new computational approach mSigHdp
, “mSigHdp:
hierarchical Dirichlet processes in mutational signature extraction”,
Liu et al. (2022) (Manuscript in revision) includes a benchmarking study
on real-tumor-based synthetic spectra with SBS or indel mutations.
The benchmarking code of this study calls the wrapper function in
SynSigRun
to run computational approaches
signeR
and SignatureAnalyzer
.
https://github.com/WuyangFF95/SynSigRun/blob/master/data-raw/SynSigRun_1.0.0.pdf