AlexandrovLab/SigProfilerExtractor

question about the SigProfilerExtractor

Closed this issue · 1 comments

Hello,

Thank you for the amazing algorithms for mutation / CNV / SV signature analysis.

I have several questions about the SigProfilerExtractor.

  1. Are there limits or recommended sample # for the mutation signature extractor?
    With 76 samples, I ran SigProfilerExtractor individually, and I also grouped them (example below)
### individual 
- one sample vcf as an input vcf for SigProfilerExtractor

### group 
- all 76 (same cancer type) as input vcfs for SigProfilerExtractor

When I did both, some samples had identical signatures assigned, and some samples found more or less signatures, though... for example.

###### When run individually...

************************ Stepwise Description of Signature Assignment to Samples ************************
                    ################ Sample 1 #################
############################# Initial Composition ####################################
    SBS1    SBS5   SBS15
0  261.0  4254.0  1105.0
L2 Error %: 0.45
Cosine Similarity: 0.89
############################## Composition After Initial Remove ###############################
          SBS5        SBS15
0  4293.268712  1326.731288
L2 Error %: 0.46
Cosine Similarity: 0.89

############################## Performing Add-Remove Step ##############################


!!!!!!!!!!!!!!!!!!!!!!!!! LAYER: 0 !!!!!!!!!!!!!!!!!!!!!!!!!


###### With other samples (76; Group)


                    ################ Sample 1 #################
############################# Initial Composition ####################################
   SBS2  SBS3    SBS5  SBS13   SBS15  ...  SBS36  SBS40b  SBS50  SBS87  SBS96J
0  86.0  36.0  1955.0   51.0  1067.0  ...  316.0    59.0  243.0  537.0   893.0

[1 rows x 13 columns]
L2 Error %: 0.32
Cosine Similarity: 0.95
############################## Composition After Initial Remove ###############################
          SBS5        SBS15
0  4293.268712  1326.731288
L2 Error %: 0.46
Cosine Similarity: 0.89

############################## Performing Add-Remove Step ##############################


!!!!!!!!!!!!!!!!!!!!!!!!! LAYER: 0 !!!!!!!!!!!!!!!!!!!!!!!!!
Best Signature Composition ['SBS1', 'SBS5', 'SBS15']
L2 Error % 0.45
Cosine Similarity 0.89


!!!!!!!!!!!!!!!!!!!!!!!!! LAYER: 1 !!!!!!!!!!!!!!!!!!!!!!!!!
Best Signature Composition ['SBS1', 'SBS5', 'SBS15']
L2 Error % 0.45
Cosine Similarity 0.89

#################### Final Composition #################################
['SBS1', 'SBS5', 'SBS15']
L2 Error % 0.45
Cosine Similarity 0.89
####################################### Composition After Add-Remove #######################################
    SBS1    SBS5   SBS15
0  261.0  4254.0  1105.0
L2 Error %: 0.45
Cosine Similarity: 0.89

I guess when I ran samples as a group, Performing Add-Remove Step initiated?
So... how many samples do you recommend for SigProfilerExtractor?

  1. if I would like to compare mutation signatures in each subtype, is it okay to extract data from activities results from the group (all 76 samples), or do you recommend filtering the samples and then running Extractor again?

  2. I would like to see TMB for each signature. i found TMB plot but i have no clue to find TMB results... is it okay to calculate TMB using activities (which may be number of mutations for each signatures; if it is not please correct me)?

Thank you

Dear @nan5895,

Thanks so much for your interest in our tool! Regarding your questions:

  1. SigProfilerExtractor is a de novo signature extraction tool, so it should always be run with a considerable number of samples (and mutations) and not for individual samples. If you are looking to assign known reference mutational signatures (e.g., from COSMIC), you should use a signature assignment tool, such as SigProfilerAssignment. Please check additional details in our recent manuscript (http://dx.doi.org/10.1093/bioinformatics/btad756).

  2. I'm not sure what you mean by filtering the samples, sorry. SigProfilerExtractor always makes use of SigProfilerMatrixGenerator to extract appropriate mutational matrices from your different mutation types, and then run de novo signature extractions independently for each mutation type. I suggest checking the SigProfilerMatrixGenerator manuscript for further details (https://doi.org/10.1186/s12864-019-6041-2).

  3. The number of mutations assigned to each signature in each sample is present in your activities results. The activity matrix is the file you are looking for. To calculate the total number of mutations assigned to each signature across the cohort, you need to simply sum the contributions of each signature across all samples.

I hope this helps, and feel free to communicate over email (mdiazgay@ucsd.edu) if you have further questions.

Best wishes,

Marcos