/BWT-perl

Simple Perl Burrows Wheeler Transform Computation Tool

Primary LanguagePerlGNU General Public License v3.0GPL-3.0

NAME
    BWT - Burrows-Wheeler Transform 

SYNOPSIS
        use BWT::Simple;
        use SA::SuffixArray;

        my $bwt1S = BWT::Simple->new();
        my $sa = SA::SuffixArray->new();
    
        # -----        Compute suffix array        ----- #
        my @suftab = $sa->Sort_Suffixes(array => \@array);

        # -----             BWT encode             ----- #
        my $bwtencode = $bwt1S->BWT_encode(suftab => \@suftab, string => \@array);

        # -----             BWT decode             ----- #
        my $bwtdecode = $bwt1S->BWT_decode(bwt => $bwtencode, terminator => $terminator);

DESCRIPTION
   The BWT is used in many lossless data compression schemes, one of
   which is the Julian Seward's bzip2 program. It is usually computed 
   by sorting all suffixes of a string S according to lexicographic 
   order after which a sequence of characters at positions preceding 
   sorted suffixes, is recorder. This sequence is known as Burrows-
   Wheeler transform.

  new
        my $bwt1S = BWT::Simple->new();

    Creates a new BWT object.

        my $sa = SA::SuffixArray->new();

    Creates a new suffix array object.

  Sort_Suffixes

        my @suftab = $sa->Sort_Suffixes(array => \@array);

        Where \@array is an array reference of string characters.

        Function returns the lexicographically sorted array of indexes
        corresponding to starting positions of string suffixes. Function is a
        simple quicksort based sorting lagorithm with O(n log n) worst case
        runtime behaviour.

  BWT_encode
        my $bwtencode = $bwt1S->BWT_encode(suftab => \@suftab, string => \@array);

        Function requires a sorted suffix array (suftab) and a string (string)
        both as array references. As a result it returns the computed BWT array

  BWT_decode
        my $bwtdecode = $bwt1S->BWT_decode(bwt => $bwtencode, terminator => $terminator);

        Once encoded, BWT can be decoded back using BWT_decode function. The 
        function requires BWT encod string and a string terminating character.

AUTHOR
    Robert Bakaric <rbakaric@irb.hr>

LICENSE



#  Copyright 2015 Robert Bakaric <rbakaric@irb.hr>
#  
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#  
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#  
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
#  MA 02110-1301, USA.
#  
#  


ACKNOWLEDGEMENT
          1. Adjeroh, D., Bell, T. and Mukherjee, A. 2008. The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching.    
          Springer US.