mskcc/mimsi

applying MiMSI on WES data

XiaofeiSong opened this issue · 6 comments

Hi there,
I found this tool very useful since our data has a low tumor purity. There are a few pre-trained models in the ./model folder. May I ask with which model you had a best practice for WES datasets?
Also, I generated a microsatellite list file with MSISensor based on the reference genome. Will you suggest intersecting the list with the exome capture file? I am asking because I saw you generate an impact-only microsatellite list.
Thank you!

We’re still actively working on improving MiMSI for WES data, but in general we’ve found that picking a model based on the average coverage of your samples improves performance. If the average coverage in your WES samples is lower, it might be better to use the 100x (50x in the tumor and 50x in the normal sample) model since the required down-sampling will have less of an effect and you’ll get more sites that meet the coverage threshold. Just note that to use the 100x model you need to set the --coverage parameter to 50 when you run the analysis. If your coverage is higher then it’s okay to use the 200x model you have listed below.

In terms of intersecting the capture file – we noticed a performance gain in terms of both accuracy and speed when we removed off-target sites, so it’s definitely worth a try in my opinion. At the very least it helps speed up the analysis because looking at less sites is less overhead for the vector generation step.

Thank you for sharing the 100x model. I tested it with the following script but got an error pasted below. I didn't have a problem before changing the coverage. Any suggestions will be highly appreciated!

`module load python/2.7.14-anaconda
cd path/to/mimsi

python -m analyze
--case-list path/to/my/caselist.txt
--microsatellites-list path/to/microsatellites.list.wesTargetedOnly
--save-location path/to/output
--model ./model/mi_msi_v0_2_0_100x.model
--coverage 50
--save`

Error message

File "main/evaluate_sample.py", line 183, in run_eval
model = MSIModel(coverage)
File "model/mi_msi_model.py", line 88, in init
nn.Linear(64 * int(self.coverage / 2) * 10, self.num_features),
TypeError: unsupported operand type(s) for /: 'str' and 'int'

Just pushed a fix for that one - if you pull down the latest version (0.3.2) coverage should be correctly interpreted by the model as a numeric parameter now!

Thank you! I updated the tool and made sure the change 'int(coverage)' had been incorporated. But I met the same error.

Sorry about that - coverage needed to be casted in the Model definition as well as in the Data Loader. Should be good to go with 0.3.3. Again, apologies for the inconvenience!

Thank you!! The most updated version works great. I greatly appreciate your help.