/bwt

String search using the Burrows-Wheeler transform

Primary LanguagePython

BWT README

Author: Kemal Eren (kemal@kemaleren.com)

ABOUT

bwt is a Python module for efficiently searching strings using the Burrows-Wheeler transform.

Sample usage:

import bwt
import urllib3

# get Pride and Prejudice, Chapters 1 through 5
url = 'http://www.gutenberg.org/cache/epub/1342/pg1342.txt'
http = urllib3.PoolManager()
text = http.urlopen("GET", url).data.decode()

chapters = text[675:30879]

# pre-compute data structures
bwt_data = bwt.make_all(chapters)

# find all occurances of the word 'married', with up to three
# mismatches.
bwt.find('married', chapters, mismatches=3, bwt_data=bwt_data)

Note: bwt.make_all() is not fast, because it uses a naive suffix array algorithm. You can compute the suffix array seperately with any efficient third-party module, and provide it to make_all().

REQUIREMENTS

  • Python (tested with 2.7.4 and 3.3.1)