yachielab/SPADE

Which is CRISPR repeat?

Closed this issue · 3 comments

I'm only interested in CRISPR repeats, but SPADES seams to find ALL the repetitive sequence.
How can I distinguish them? Any additional information stored in the resulting .gb files?

Thanks in advance!

Thank you for using SPADE.

In default, SPADE doesn't have a function to categorize kinds of detected repeat sequences.
However, In this paper (https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky890/5124599), I defined the periodic repeat sequences with interspace sizes of 25–60 bp and repeating periods of 58–81 bp as CRISPR candidates.
Therefore, If you are interested in only CRISPRs, by following the above definition, please extract the CRISPR candidates from the periodic repeat sequences SPADE detected.

Sorry for the inconvenience.

Thanks, I got it!

One more question is, where can I get interspace sizes then? It seems only repeting periods are stated in .gbk files... Need to re-calculate from joined location statement in .gbk?

e.g.)
Assume we have repetetive sequence at join(958..987,1003..1032,1069..1098),
then average interspace sizes can be calculated by ((1003-987) + (1069-1032)) / 2 = 26.5 ?

A interspace size can be calculated by the difference between period and rpt_unit_seq length of the periodic repeat region.