/xvcf-rs

VCF/BCF [un]compressed [un]indexed

Primary LanguageRustMIT LicenseMIT

Continuous integration

This is a small library that attempts to allow efficient reading of VCF and BCF files that are either compressed or uncompressed and indexed or not. noodles is used for the parsing, this unifies handling above that.

Even when not indexed, it allows "skipping" via iterating until the requested region is reached. This is useful for bedder-rs but might also be useful elsewhere.

The skipping will be most efficient when the files are compressed and indexed.

NOTES

Implementation

As implemented, only compressed, indexed VCF or BCF will have the absolute highest performance. Others will require skipping via iterating over the file until the given region is reached. This assumes that any indexed file will also support Seek.

Specialization (abandoned)

This is not used so we can avoid dependence on nightly rust.

AFAICT, one way to do this is with specialization

If the file is not indexed and Seekable, then we just iterate over the records to skip to a given region. If it is Seekable and has an index, then we use the query functionality.

The rust documentation indicates that :

Only specialization using the min_specialization feature should be used. The full specialization feature is known to be unsound.

But I can't get this to compile with min_specialization. It also seems that even min_specialization will require nightly for the foreseeable future.

Have a look here

Perhaps there's completely different, and better way to do this. Let me know.