Author: Kemal Eren (kemal@kemaleren.com)
bwt
is a Python module for efficiently searching strings using the
Burrows-Wheeler transform.
Sample usage:
import bwt
import urllib3
# get Pride and Prejudice, Chapters 1 through 5
url = 'http://www.gutenberg.org/cache/epub/1342/pg1342.txt'
http = urllib3.PoolManager()
text = http.urlopen("GET", url).data.decode()
chapters = text[675:30879]
# pre-compute data structures
bwt_data = bwt.make_all(chapters)
# find all occurances of the word 'married', with up to three
# mismatches.
bwt.find('married', chapters, mismatches=3, bwt_data=bwt_data)
Note: bwt.make_all()
is not fast, because it uses a naive suffix
array algorithm. You can compute the suffix array seperately with any
efficient third-party module, and provide it to make_all()
.
- Python (tested with 2.7.4 and 3.3.1)