Very low number of matched variants when comparing known SNVs from several studies with the results obtained by SComatic when applied to the same data
FranSoriano opened this issue · 4 comments
Hello everyone,
We are interested in using SComatic to detect single-nucleotide mutations in single-cell data, for which we have tested the performance of the tool using scRNA-seq open data from two studies (PMIDs 35140215 and 32415257), which also have available WES data. We have compared the SNVs reported by those studies with the results from SComatic and detected a very low number of matching variants (ranging from 0% to 5%). Why is the percentage so small? Is this a normal/expected rate?
Previously to this analysis we used the tool following the tutorial that appears in the repository and everything worked correctly, so a priori we believe that it is not due to an error on our part in the execution.
Thank you.
Dear user,
Thanks for using SComatic. Could you please put the number of mutations detected in a WES sample and the matched scRNA-seq sample?
And secondly, could you run this command and put the output here?
awk '$1 ~ /^#/ || $6 == "PASS"' file.step2.tsv | grep -v '^#' | awk -v OFS=">" '{print $4,$5}' | sort | uniq -c
Thanks,
Fran
Dear Fran,
The number of mutations (without including frameshift mutations) reported in WES samples that we examined, and the matched sc-RNAseq mutations detected by SComatic, respectively, are 48/0, 48/1, 48/1, 28/1, 28/0, 28/0, 24/1, 24/0, 24/0, 93/5, 382/0, and 281/0.
Here is the output of the command for one of the files:
6 A>C
46 A>G
5 A>T
11 C>A
5 C>G
23 C>T
28 G>A
3 G>C
6 G>T
2 T>A
31 T>C
4 T>G
Hope this sheds some light into the issue.
Thanks.
Hi,
Could you please check the coordinates of some of the expected (WES) mutations in the output of the Step4.1 ? The column FILTER should say the reason because they were (or not) filtered. If you do not find these coordinates in the file, it means that there were not enough reads covering such sites and they were not interrogated.
Thanks,
Fran
Hi, Fran,
The low number of reads covering the sites may indeed be the reason why these variants are not called as expected. However, we are using the default threshold, and we guess a lower minimum coverage to consider a genomic site would be too low. We will have that in mind when doing our analyses.
Thank you so much for your help.