Parsing VCF fails for ID in header
apraga opened this issue · 3 comments
Hi,
Thanks for you package. Some data (Clinvar in my case) has a ID
in header
##ID=<Description="ClinVar Variation ID">
Minimal working example
##fileformat=VCFv4.1
##ID=<Description="ClinVar Variation ID">
#CHROM POS ID REF ALT QUAL FILTER INFO
1 1 2 A G . . .
Parsing fails with
called
Result::unwrap()on an
Err value: Custom { kind: InvalidData, error: InvalidRecord(InvalidValue(InvalidOtherMap(Other("ID"), ParseError { id: None, kind: MissingId }))) }
Removing the offending header line solve the issue. I'm not sure if that should be managed by noodles but figured it would help to report it nevertheless.
Thanks,
Thanks for reporting, @apraga. This case is undefined in VCF 4.1/4.2. I asked for clarification in samtools/hts-specs#760.
This behavior remains undefined, but for inputs that are VCF < 4.3, noodles will now read other header record values that start with a map prefix (<
) as a string if there is no map identifier (i.e., ID=
), e.g.,
##fileformat=VCFv4.2
##A=<a>
##B=<ID=b>
#CHROM POS ID REF ALT QUAL FILTER INFO
Header {
file_format: FileFormat { major: 4, minor: 2 },
other_records: {
Other("A"): Unstructured(["<a>"]),
Other("B"): Structured({
"b": Map { inner: Other { id_tag: Standard(Id) }, other_fields: {} },
}),
},
// ...
}
This is explicitly invalid in VCF >= 4.3 and will continue to fail, e.g.,
##fileformat=VCFv4.3
##A=<a>
##B=<ID=b>
#CHROM POS ID REF ALT QUAL FILTER INFO
Error: Custom {
kind: InvalidData,
error: InvalidRecord(InvalidValue(InvalidOtherMap(
Other("A"),
ParseError {
id: None,
kind: InvalidField(InvalidKey(UnexpectedEof)),
},
))),
}
These rules also apply to the serializer when writing a VCF header.