wdecoster/cramino

Not including chromosome with `_` in the name

Closed this issue · 4 comments

Adoni5 commented

Hi @wdecoster - this is a continuation of #18!

if !chrom.contains('_') {

This line was the guilty party as to why I had empty normalised alignment counts per contig. The problem is the HG38 reference has _ in the contig names (example : NC_000024.10), and I'm not sure why this check was included. What was it's purpose originally? Could it be removed?

Thanks
Rory

Hi Rory,

I probably wanted to ignore all the extra (unplaced) contigs from the karyotype output and limit the output to the "main" chromosomes, but that is indeed a problem if all chromosomes are like that. Whoops! I should have put some more thought into this.
But this line will have to go then, and users will have to deal with many more small contigs. I can't come up with a way that, without fail, removes the extra contigs.

Wouter

Not fixed with that PR :)

Adoni5 commented

Haha! I figured it would be something like excluding alt contigs. I guess people can just remove them from the reference pre alignment.

I'll open an excellent PR for this and cash in some Hacktoberfest credit

I guess in a more complicated CLI --karyotype could optionally take a file/list of contigs that the user wants to use for the karyotype, but for now (or forever) this will have to do :)
Thanks again!