ding-lab/msisensor

MSIsensor,how to classify the MSI high and MSI low?

anything4share opened this issue · 6 comments

Dear beifang,
Sorry liberty to disturb. I'm hongsen qu and a new for MSIsenor from China. Recently I have install MSIsenor and run the command "bash run.sh" for test,then I got the MSI score 100%(Number_of_Somatic_Sites %) which means microsatellite instability, and it also means MSI high, how to classify the MSI high and MSI low ? Looking forward to your reply.

Here's what we use:

#!/bin/bash
#
# Generate summary file for MSIsensor analyses
#
# Inputs
# ======
# 1. MSIsensor output file (with full path)
#       e.g. /path/<sample>.msi.output
#
# 2. Output file (with full path)
#       e.g. /path/<sample>.MSI.summary.txt
#
# 3. Patient identifier
#       e.g. <sample>
#
# Outputs
# =======
# 1. a single text file
#       i.e.
#           75.00 percent of microsatellites were unstable
#           <sample> is likely => MSI-high
#
# Info
# ====
# Written by:   Morgan Bye
# Authored:     2017-11-06
# Version:      1.0
#
# History
# =======
# 1.0 - 2016-08-08
#     Initial write

if [[ $# -eq 3 ]]; then
    IN=$1
    OUT=$2
    PATIENT=$3
else
    echo "Usage: MSIsensor_summary.sh"
    echo "            <MSIsensor_file> <output_file> <patient_id>"
    echo
    exit 1
fi

SCORE=`tail -n1 $IN | cut -f3`

LABEL=`echo $SCORE | awk '{if ($1 <10) print "=>","MSS";
                    else if ($1 >30) print "=>", "MSI-high";
                    else if ($1 >10 && $1 <30) print "=>","MSI-low";
                    else print "=>NA";}'`

printf "$SCORE percent of microsatellites were unstable\n$PATIENT is likely $LABEL" > $OUT

Gives you an output file, for example:

0.34 percent of microsatellites were unstable
<patient> is likely => MSS

Appreciate for sharing your script. btw, we have a new version v0.3 in github now and it can process tumor only data for msi status detection.

Thanks @morganbye this is really helpful - how did you choose the thresholds of 10 and 30?

Sorry for taking so long to get back to you. But the short answer is that we had an analyst sit down with ~1000 patient samples and graph it out in R, the thresholds became pretty apparent.