/js-local-vcf

Javascript local VCF and Tabix index file parsing and processing

Primary LanguageJavaScriptMIT LicenseMIT

js-local-vcf

Javascript local VCF and Tabix index file parsing and processing

Usage:

Include the following libs:

There are two 'object' types with constructors:

readTabixFile which takes a filespec and initializes a tabix reader. Provides methods

  • getIndex - builds the index information
  • bin2Ranges - returns the chunk information for a [ref binid]
  • bin2Beg - returns first chunk of bin
  • bin2End - returns last chunk of bin
  • getChunks - returns all chunks for bins covering region in ref

Details below

readBinaryVCF which takes a tabix filespec, a BGZF VCF filespec, and a user supplied callback function: initializes a tabix reader, builds binary VCF reader, calls user CB at finish. Provides methods

  • getHeader - obtains and returns the VCF header lines
  • getRecords - obtains the data records in a reference region and returns as a vector of strings to provided callback
  • getChunks - returns all chunks covered by region

Details below

Example:

With files[0] == vcf file
     files[1] == tabix file

var x = undefined;
var chunks = undefined;
vcfR = new readBinaryVCF(files[1], files[0],
  function(vcfR) {
    vcfR.getRecords(11, 1000000, 1015808, function(rs){x = rs;});
    chunks = vcfR.getChunks(11, 1000000, 1015808);
  });

================== readTabixFile ===================

function readTabixFile(tabixFile) {

Constructor for tabix reader and decoder. tabixfile is a bgzipped tabix binary index file.

readTabixFile.prototype.getIndex = function(cb)

Main function for a tabix reader. Obtains and decodes the index and caches information on it used by other methods. So, must be called before others.

readTabixFile.prototype.refName2Index = function(name)

Converts a reference name to its tabix index. References in vcf processing always need to be by their index. Requires that getIndex has been run

readBaiFile.prototype.bin2Ranges = function(ref, binid)

Takes a ref and binid and builds a return vector mapped from the chunk sequence of bin, where each element is a two element vector defining the region of a chunk. The begin and end of each region are the base virtual file offsets (the 16 bit right shifted values) and the offset within the INflated block (the lower 16 bits). Returns a vector [[[vfbeg, bobeg], [vfend, boend]], ...] where

  • vfbeg is the virtual file offset of beginning bgzf block
  • bobeg is the offset within the inflated block of that block
  • vfend is the virtual file offset of ending bgzf block
  • boend is the offset of last byte in that block
readBaiFile.prototype.bin2Beg = function(binid)

First chunk region of binid.

readBaiFile.prototype.bin2End = function(binid)

Last chunk region of binid.

readBaiFile.prototype.getChunks = function(ref, beg, end)

For a reference REF region defined by BEG and END return the set of chunks of all bins involved as a flat vector of two element vectors, each defining a region of a bin.

================== readBinaryVCF ===================

function readBinaryVCF (tbxFile, vcfFile, cb)

Constructor for BGZF VCF reader and decoder. tabixfile is a bgzipped tabix binary index file for VCFFILE, a BGZF encoded VCF file. Inits and builds index and initializes the VCF reader, then calls cb with the VCF reader. Returns VCF reader.

readBinaryVCF.prototype.getHeader = function (cbfn)

Obtain and return the VCF header information as a vector of strings. Calls cbfn with this vector. All header lines begin with a "#" and start as the first line of the file and stop at first line starting without a "#" in char pos 0.

readBinaryVCF.prototype.getRecords = function (ref, beg, end, cbfn)

Main function for VCF reader. For a record region defined by BEG ad END, obtains the set of bins and chunks covering the region, inflates the corresponding data blocks, converts to a vector of strings (one for each record) and filters these to ensure only those in the range are kept. The resulting filtered vector of strings is returned by calling CBFN with the vector.

readBinaryVCF.prototype.getChunks =
    function (ref, beg, end) {

Synonym for tabix getChunks. Directly callable on a vcfReader.