Version: 0.1
An implementation of decodable de Bruijn sequences. Implementation of binary de Bruijn decoding based on Knuth 4A, whereas higher alphabets follow the extended work by Jonathan Tuliani.
As far as I can tell, this is the first public implementation of these algorithms, possibly even the first implementation.
The included example script (which assumes an alphabet of size 4: {A,C,G,T} - currently only
binary and quaternary are supported) accepts one parameter for the subsequence lengths,
and accepts input from stdin
.
As an example, if file
contained 26-length strings consisting of the symbols A, C, G and T:
cat file | python debdec_example.py 26
If you want to use it as a library in your own code:
>>> from debdec import make_decoder
>>> d = make_decoder(5) # Assumes alphabet size of 4
>>> d([0,1,0,1,1])
352
I need to fix the interface to be cleaner, but if binary decoding is needed, decode_db
can be used:
>>> from debdec import decode_db
>>> decode_db(5, [0,1,0,1,1]) # first parameter is the length
9
- Write unit tests
- Refactor
- Provide a facade for a nicer
- Pack the sequence representation (currently uses lists of ints, could use bitvectors)
- Remove the unnecessary use of Euclids algorithm
- Profile and add memoization and tables where it could help.
- Provide vectorized version, perhaps using CUDA
- ???
- Profit
Free to do whatever you like. All I ask is that if you make an improvement, contact me and submit a pull request. If you use it for something, attribution is certainly welcome and appreciated :)
Jonathan Tuliani, Don Knuth, Chris Mitchell