Not including chromosome with `_` in the name
Closed this issue · 4 comments
Hi @wdecoster - this is a continuation of #18!
Line 17 in b054f4d
This line was the guilty party as to why I had empty normalised alignment counts per contig
. The problem is the HG38 reference has _
in the contig names (example : NC_000024.10
), and I'm not sure why this check was included. What was it's purpose originally? Could it be removed?
Thanks
Rory
Hi Rory,
I probably wanted to ignore all the extra (unplaced) contigs from the karyotype output and limit the output to the "main" chromosomes, but that is indeed a problem if all chromosomes are like that. Whoops! I should have put some more thought into this.
But this line will have to go then, and users will have to deal with many more small contigs. I can't come up with a way that, without fail, removes the extra contigs.
Wouter
Not fixed with that PR :)
Haha! I figured it would be something like excluding alt contigs. I guess people can just remove them from the reference pre alignment.
I'll open an excellent PR for this and cash in some Hacktoberfest credit
I guess in a more complicated CLI --karyotype could optionally take a file/list of contigs that the user wants to use for the karyotype, but for now (or forever) this will have to do :)
Thanks again!