PacificBiosciences/pbbioconda

Request for improved pbmm2 error message around zipped references

mrvollger opened this issue · 8 comments

Operating system
redhat

Package name

pbmm2 1.13.0
Using:
  pbmm2    : 1.13.0 (commit v1.13.0-2-gbcd99f5)
  pbbam    : 2.4.99 (commit v2.4.0-23-g59248fe)
  pbcopper : 2.3.99 (commit v2.3.0-28-ga9b1ffa)
  boost    : 1.81
  htslib   : 1.17
  minimap2 : 2.26
  zlib     : 1.2.13

Describe the bug
When providing a zipped reference pbmm2 complains about the format of the input reads instead of the reference.

Error message

pbmm2 align ERROR: Could not determine read input type(s). Please do not mix data types, such as BAM+FASTQ. File of files may only contain BAMs or datasets.

To Reproduce

pbmm2 align ref.fa.gz ../data/hap-alns/GM12878.PacBio.H1.GRCh38.bam example.bam
>|> 20240820 22:31:55.970 -|- WARN -|- operator() -|- 0x7f11a49d9f80|| -|- Input is aligned reads. Only primary alignments will be respected to allow idempotence!
>|> 20240820 22:31:55.970 -|- FATAL -|- CheckPositionalArgs -|- 0x7f11a49d9f80|| -|- pbmm2 align ERROR: Could not determine read input type(s). Please do not mix data types, such as BAM+FASTQ. File of files may only contain BAMs or datasets.

Expected behavior
I know that gzipped references are not supported for pbmm2, but it took me quite a while to discover this when I was looking for issues with the input reads rather than the reference. Alternatively, support for zipped references would be great!

As a side note, it would be nice if pbmm2 allowed the .fna extension for references, which is sometimes the extension you get when downloaded from NCBI, e.g.:

GCA_000001405.15_GRCh38_no_alt_analysis_set.fna

Thanks,
Mitchell

We might consider adding, but at this point, you are the first one to ask in like >5 years.

It seems like an uncommon use case, but especially since minimap2 can take gzipped references, this does (negligibly) complicate porting over to pbmm2.

I would love gzip compatibility, but I wanted to clarify that my main issue is the error message when you use a gzip reference:

pbmm2 align ERROR: Could not determine read input type(s). Please do not mix data types, such as BAM+FASTQ. File of files may only contain BAMs or datasets.

This error incorrectly indicates that the format of the reads rather than the reference is incorrect.

That error message should be fixed in the latest version that we'll release soon

Awesome, thanks @armintoepfer