seqwish output with simulated data
egoltsman opened this issue · 1 comments
Hi Erik,
I am doing some precision/recall analysis on a simulated set of 13 samples where each "sample" is a random mutant of a real-life plant chromosome. I introduced exactly 200 SVs per sample and the types range between deletion, inversion, tandem-duplication, and translocations. The variant sizes are fixed at 500bp and 10kb. After using edyeet+seqwish to construct the graphs with these sequences, plus the original reference, I now have 14 graph of increasing complexity and would like to see how well the variants can be "deconstructed" from them. So I took the GFA->vg route for each graph and used 'vg snarls' to get the bubbles out. It reports a lot more variants than what I had introduced, even in a 2-sample graph. My suspicion is that edyeet misaligned some of the regions, and I want to try it again with more stringent parameters. Do you think this is something worth pursuing, or is edyeet not designed to handle this scenario?
Another question is about the GFA tags that seqwish puts it. Sorry if this is described in some obvious place, but what are the DP: RC: tags for?