YASSI is Python module, written in C. It implements transcription factor binding site search on a sequence. Given a list of binding sites (motif), it builds PSSM (position specific scoring matrix) and scan the genome. It returns the list of (position, score) tuples, sorted by score.
When compared with Biopython motif.search_pwm implementation, it is around 20 times
faster. (NOTE: I recently realized that Biopython has already a C implementation for
PSSM search, which can be called within Python with Motif.scanPWM()
method. Since
it is C too, it should be fast and likely to be bug-free as it is used by lots of
people.)
To use it, the first step is to compile C code. To compile it into Python module run:
$ python setup.py build $ sudo python setup.py install
This will place YASSI on your load path so that you can import it into any Python script by typing:
import yassi
just like any other Python module. The available methods are:
search(motif, genome, bg_prob=[0.25, 0.25, 0.25, 0.25])
returns the list of putative binding sites, each represented by a tuple which contains position and PSSM score of that site.build_PSSM(motif, bg_prob=[0.25, 0.25, 0.25, 0.25])
returns the PSSM as a list of columns, where each column is a list of four numbers, scores for observing an A, C, G, or T, respectively. The optional parameterbg_prob
is background probabilities for ACGT, respectively.
>>> import yassi
>>> import test
>>> motif = [test.random_site(10) for i in range(10)]
>>> motif
['gcatgggaaa', 'gatggcgaaa', 'tactaggaat', 'cgttatacga', 'cttcatcgcc', 'aggttcttta', 'taatcccgaa', 'cctttattaa', 'ccgtacggca', 'aatccccgag']
>>> genome = test.random_site(50000)
>>> genome[:20]
'cttggccatttacgcggaaa'
>>> putative_binding_sites = yassi.search(motif, genome)
>>> putative_binding_sites[:5]
[(30730, 7.682890082548839), (47223, 7.152375365850061), (24101, 7.152375365850059), (8542, 6.908843836330346), (18600, 6.84752078432164)]
>>>