seqan/seqan3

Unsupported SAM header tag in @HD: pb

notestaff opened this issue · 6 comments

Platform

  • SeqAn version: 3.3.0
  • Operating system: Linux bpb23-acc 5.4.0-136-generic #153-Ubuntu SMP Thu Nov 24 15:56:58 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Compiler: x86_64-conda-linux-gnu-c++ (conda-forge gcc 12.3.0-2) 12.3.0

Question

Is there a way to suppress the message Unsupported SAM header tag in @HD: pb when reading PacBio bam files?
Relatedly, the documentation says

* The function throws a seqan3::format_error if any unknown tag was encountered. It will also fail if the format is
, but that's no longer the case, right? (And the current behavior of not throwing an error seems correct.)

Hey @notestaff,

once again, thank you for finding flaws in our documentation!

It should definitely not say that it throws. It's probably a leftover from when we amended the alignment file parsing to be a bit less strict.

Do you have an example SAM/BAM file? Just the header section and one record/entry should be enough. We can also just add the pb header to a file ourselves, but it's better to have an example of what you are using.

In general, pb is not part of the SAM-specification's @HD tag, so we emit a warning.

What would be a solution you are looking for? It would be possible to redirect std:cerr while parsing the file, such that the warning is not emitted.

Some option to ignore/warn/error would also be possible, but would require code restructuring.

The first header line reads
@HD VN:1.6 SO:unknown pb:5.0.0
What would be useful is an option to suppress warnings about unsupported SAM header tag. Simplest might be to make this a compilation option, e.g. if SEQAN3_SUPPRESS_UNSUPPORTED_HEADER_TAG_WARNING is defined to 1 then don't print the warning. Or this could be part of the configuration options when opening a SAM file. Best would of course be to have a way to specify additional recognized tags. Thanks!

Great, thanks a lot! Just in time for external release of our seqan3-based tool :)

But, this fix suppresses ALL warnings -- a way to suppress specific ones would be better.

Also, the SAM spec says "you can freely add new tags for further data fields. Tags containing lowercase letters are reserved for local use". So I'm not sure this should be a warning at all, at least by default, especially for lowercase tags like in our example.

But, this fix suppresses ALL warnings -- a way to suppress specific ones would be better.

It's currently the only warning we emit ("Unsupported SAM header tag in [...]").

It's also a bit more flexible than just on/off in that you can postprocess the warning, e.g.,

void filter()
{
    auto fin = get_sam_file_input();
    std::ostringstream stream{};
    fin.options.stream_warnings_to = std::addressof(stream);
    auto it = fin.begin();
    for (auto && warning : stream.view() | std::views::split('\n'))
    {
        if (std::string_view sv{warning.data(), warning.size()}; !sv.empty() && !sv.ends_with("pb"))
            std::cerr << sv << '\n';
    }
}

Would be nicer to overload operator<<, but then we would need either some customisation point object...

Also, the SAM spec says "you can freely add new tags for further data fields. Tags containing lowercase letters are reserved for local use". So I'm not sure this should be a warning at all, at least by default, especially for lowercase tags like in our example.

I did miss this one sentence in the spec multiple times :)
Before this was a warning, it was an exception.
I'm not sure if the warning is meant as "it's not valid, but we just continue" or "we don't support that" (in contrast to alignment tags).