MUSE-XAE is a user-friendly tool powered by the robust capabilities of autoencoder neural networks, allowing for the extraction and visualization of SBS mutational signatures present in a tumor catalog. MUSE-XAE consists of a hybrid denoising autoencoder with a nonlinear encoder that enables learning of nonlinear interactions and a linear decoder that ensures interpretability. Based on the experiments, MUSE-XAE has proven to be one of the best performing and accurate tools in extracting mutational signatures. To delve deeper into its workings, please read the related paper.
After downloading the repository we suggest to create a conda environment with python 3.10 and 1.24.3 numpy consequently install the requirement libraries via pip, folliwing the step:
-
Create the environment:
conda create -n MUSE_env python=3.10 numpy=1.24.3
-
Activate the environment:
conda activate MUSE_env
-
Installing other libraries:
pip install -r requirements.txt
MUSE-XAE is currently available through GitHub and the code runs only on CPUs, but it will soon be available as an installable Python package and will be able to run on GPUs.
MUSE-XAE is constituted by two main modules: MUSE-XAE De-Novo Extraction
and MUSE-XAE Refitting
In general both MUSE-XAE De-Novo Extraction
and MUSE-XAE Refitting
modules assumes that the input tumor catalog is in .csv o .txt (with tab separated) format.
The tumour catalogue M
should be a 96xN
matrix where N
is the number of tumours and 96
is the number of SBS mutational classes
.
MUSE-XAE assumes that 96 mutational classes order is the one of COSMIC
. If you want to use a different order in your catalogue please add a Type
column with the desired order.
Finally put your dataset
in the datasets folder. To have an idea of the input file structure you can find some examples in the datasets
folder.
In the following image we show the first five rows of the example dataset for simplicity.
All the synthetic datasets reported in this repo and used in the paper are taken from Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor
from Islam et al. [1]. Links are provided in the reproducibility notebook inside the notebook folder.
MUSE-XAE De-Novo Extraction
module perform De-Novo Extraction of mutational signatures and then uses MUSE-XAE Refitting
module
to assign mutations to the extracted signatures.
For a quick test to extract mutational signatures with default parameters on an Example
dataset run the following:
python ./MUSE-XAE/main.py --dataset Example --min_sig 2 --max_sig 5 --iter 5 --augmentation 10
For the standard usage we suggest to use default
parameters choosen based on experiments.
All the optional arguments are listed below:
--dataset
: (Required) Dataset name.--augmentation
: Number of times of data augmentation. Default is100
.--iter
: Number of repetitions for clustering. Default is100
.--max_sig
: Max signatures to explore. Default is25
.--min_sig
: Min signatures to explore. Default is2
.--batch_size
: Batch Size. Default is64
.--epochs
: Number of epochs. Default is1000
.--run
: Parameter for multiple runs to test robustness.--mean_stability
: Average Stability for accepting a solution. Default is0.7
.--min_stability
: Minimum Stability of a Signature to accept a solution. Default is0.2
.--directory
: Main Directory to save results. Default is./
.--loss
: Loss function to use in the autoencoder. Default ispoisson
.--activation
: Activation function. Default issoftplus
.--n_jobs
: number of parallel jobs. Default is24
.--cosmic_version
: Cosmic version reference. Default is3.4
.
MUSE-XAE Refitting
module perform a consensus refitting made by 10
repetitions of MUSE-XAE refitting algorithm to increase robustness and
reliability of assignment.
To Refit
COSMIC signatures to an Example
dataset run the following:
python ./MUSE-XAE/main.py --dataset Example --refit_only True
By default the reference set is the COSMIC v3.4
SBS signatures. If you want to add your own reference set use
--reference_set Signatures_set
to the previous line.
You need to be sure that your Signatures_set
is in the dataset
folder.
We suggest to use the default
parameters but you can also specifiy the following parameters:
--dataset
: (Required) Dataset name.--refit_regularizer
: Refit Penalty type. Default isl1
--refit_penalty
: Refit Penalty amount. Default is0.001
--refit_loss
: Refit Loss function. Default ismae
--reference_set
: Signature Set to Refit. Default isCOSMIC_SBS_GRCh37_3.4
--remove_artefact
: Remove known artefact. Default isTrue
--refit_patience
: Patience before stopping the refitting. Default is200
--n_jobs
: Number of cpu to use in parallel. Default is12
Running MUSE-XAE
will generate an Experiments
(or a directory specified by the user) directory
with subfolders.
For the MUSE-XAE De-Novo Extraction
module in Plots
folder
you will find the extracted signatures profile
For both MUSE-XAE De-Novo Extraction
and MUSE-XAE De-Novo Refitting
you will also find
the distribution of exposures in all samples and how each signatures contribute to mutations in each samples.
- Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor, Islam et al. DOI
- The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45, 1113–1120 (2013) DOI
- Signal: The home page of mutational signatures. S. Shooter, J. Czarnecki, S. Nik-Zainal DOI