Javascript local VCF and Tabix index file parsing and processing
Usage:
Include the following libs:
- https://raw.github.com/vjeux/jDataView/master/src/jdataview.js
- https://raw.github.com/vjeux/jParser/master/src/jparser.js
- inflate.js (fetch and place or fetch remotely)
- pako_deflate.min.js - included in JSLibs
- jsbgzf.js - included in JSLibs
- js-bv-common.js - included in JSLibs
- js-bv-sampling.js - included in JSLibs
- js-iobio-common.js - included in JSLibs
- js-local-vcf.js (this file)
There are two 'object' types with constructors:
readTabixFile
which takes a filespec and initializes a tabix
reader. Provides methods
getIndex
- builds the index informationbin2Ranges
- returns the chunk information for a [ref binid]bin2Beg
- returns first chunk of binbin2End
- returns last chunk of bingetChunks
- returns all chunks for bins covering region in ref
Details below
readBinaryVCF
which takes a tabix filespec, a BGZF VCF filespec, and
a user supplied callback function: initializes a tabix reader, builds
binary VCF reader, calls user CB at finish. Provides methods
getHeader
- obtains and returns the VCF header linesgetRecords
- obtains the data records in a reference region and returns as a vector of strings to provided callbackgetChunks
- returns all chunks covered by region
Details below
Example:
With files[0] == vcf file
files[1] == tabix file
var x = undefined;
var chunks = undefined;
vcfR = new readBinaryVCF(files[1], files[0],
function(vcfR) {
vcfR.getRecords(11, 1000000, 1015808, function(rs){x = rs;});
chunks = vcfR.getChunks(11, 1000000, 1015808);
});
================== readTabixFile ===================
function readTabixFile(tabixFile) {
Constructor for tabix reader and decoder. tabixfile is a bgzipped tabix binary index file.
readTabixFile.prototype.getIndex = function(cb)
Main function for a tabix reader. Obtains and decodes the index and caches information on it used by other methods. So, must be called before others.
readTabixFile.prototype.refName2Index = function(name)
Converts a reference name to its tabix index. References in vcf
processing always need to be by their index. Requires that
getIndex
has been run
readBaiFile.prototype.bin2Ranges = function(ref, binid)
Takes a ref and binid and builds a return vector mapped from the chunk sequence of bin, where each element is a two element vector defining the region of a chunk. The begin and end of each region are the base virtual file offsets (the 16 bit right shifted values) and the offset within the INflated block (the lower 16 bits). Returns a vector [[[vfbeg, bobeg], [vfend, boend]], ...] where
- vfbeg is the virtual file offset of beginning bgzf block
- bobeg is the offset within the inflated block of that block
- vfend is the virtual file offset of ending bgzf block
- boend is the offset of last byte in that block
readBaiFile.prototype.bin2Beg = function(binid)
First chunk region of binid.
readBaiFile.prototype.bin2End = function(binid)
Last chunk region of binid.
readBaiFile.prototype.getChunks = function(ref, beg, end)
For a reference REF region defined by BEG and END return the set of chunks of all bins involved as a flat vector of two element vectors, each defining a region of a bin.
================== readBinaryVCF ===================
function readBinaryVCF (tbxFile, vcfFile, cb)
Constructor for BGZF VCF reader and decoder. tabixfile is a bgzipped tabix binary index file for VCFFILE, a BGZF encoded VCF file. Inits and builds index and initializes the VCF reader, then calls cb with the VCF reader. Returns VCF reader.
readBinaryVCF.prototype.getHeader = function (cbfn)
Obtain and return the VCF header information as a vector of strings. Calls cbfn with this vector. All header lines begin with a "#" and start as the first line of the file and stop at first line starting without a "#" in char pos 0.
readBinaryVCF.prototype.getRecords = function (ref, beg, end, cbfn)
Main function for VCF reader. For a record region defined by BEG ad END, obtains the set of bins and chunks covering the region, inflates the corresponding data blocks, converts to a vector of strings (one for each record) and filters these to ensure only those in the range are kept. The resulting filtered vector of strings is returned by calling CBFN with the vector.
readBinaryVCF.prototype.getChunks =
function (ref, beg, end) {
Synonym for tabix getChunks. Directly callable on a vcfReader.