/yassi

yet another site search implementation

Primary LanguageC

YASSI: Yet Another Site Search Implementation

YASSI is Python module, written in C. It implements transcription factor binding site search on a sequence. Given a list of binding sites (motif), it builds PSSM (position specific scoring matrix) and scan the genome. It returns the list of (position, score) tuples, sorted by score.

When compared with Biopython motif.search_pwm implementation, it is around 20 times faster. (NOTE: I recently realized that Biopython has already a C implementation for PSSM search, which can be called within Python with Motif.scanPWM() method. Since it is C too, it should be fast and likely to be bug-free as it is used by lots of people.)

To use it, the first step is to compile C code. To compile it into Python module run:

$ python setup.py build
$ sudo python setup.py install

This will place YASSI on your load path so that you can import it into any Python script by typing:

import yassi

just like any other Python module. The available methods are:

  • search(motif, genome, bg_prob=[0.25, 0.25, 0.25, 0.25]) returns the list of putative binding sites, each represented by a tuple which contains position and PSSM score of that site.
  • build_PSSM(motif, bg_prob=[0.25, 0.25, 0.25, 0.25]) returns the PSSM as a list of columns, where each column is a list of four numbers, scores for observing an A, C, G, or T, respectively. The optional parameter bg_prob is background probabilities for ACGT, respectively.
>>> import yassi
>>> import test
>>> motif = [test.random_site(10) for i in range(10)]
>>> motif
['gcatgggaaa', 'gatggcgaaa', 'tactaggaat', 'cgttatacga', 'cttcatcgcc', 'aggttcttta', 'taatcccgaa', 'cctttattaa', 'ccgtacggca', 'aatccccgag']
>>> genome = test.random_site(50000)
>>> genome[:20]
'cttggccatttacgcggaaa'
>>> putative_binding_sites = yassi.search(motif, genome)
>>> putative_binding_sites[:5]
[(30730, 7.682890082548839), (47223, 7.152375365850061), (24101, 7.152375365850059), (8542, 6.908843836330346), (18600, 6.84752078432164)]
>>>