ribbit

Ribbit is a tool to identify tandem repeats of variable motif sizes. The algorithm converts DNA sequences to 2-bit format and uses basic bit operations to identify tandem repeat sequences.

Installation
Usage
Inputs and Outputs
Citation
Contact

Installation

To install Ribbit, clone the repository and install the dependencies using the following commands:

git clone https://github.com/SowpatiLab/ribbit
cd ribbit

Usage

Here’s a basic usage example:

 ./ribbit [options] -i sequence.fasta --output results.bed

To view detailed help information

 ./ribbit -h

The output would be given as folllowing.

  -h [ --help ]                 Ribbit tool identifies short tandem repeats 
                                with allowed levels of impurity.
  -i [ --input-file ] arg       File path for the input fasta file.
  -o [ --output-file ] arg      File path for the output file.
  -m [ --min-motif-length ] arg The minimum length of the motif of the repeats 
                                to be identified. Default: 2
  -M [ --max-motif-length ] arg The maximum length of the motif of the repeats 
                                to be identified. Default: 100
  -p [ --purity ] arg           Threshold value for the continuous number of 
                                ones found in a seed. Default: 0.85
  -l [ --min-length ] arg       The minimum length of the repeat. Default: 12
  --min-units arg               The minimum number of units of the repeat. Can 
                                be an integer value for cutoff across all motif
                                sizes, or a tab-separated file with two columns: 
                                the first is the motif size and the second is 
                                the unit cutoff. Default: 2
  --perfect-units arg           The minimum number of complete units of the 
                                repeat. Can be an integer value for cutoff 
                                across all motif sizes, or a tab-separated file 
                                with two columns: the first is the motif size and 
                                the second is the unit cutoff. Default: 2

Inputs and Outputs

-i or --input

Expects: STRING (to be used as filename)

The input file must be a valid FASTA file.

-o or --output

Expects: STRING (to be used as filename)

The output for ribbit is .bed file.

bed file output columns

S.No	Column	Description
1	Chromosome	Chromosome or Sequence Name as specified by the first word in the FASTA header
2	Repeat Start	0-based start position of SSR in the Chromosome
3	Repeat Stop	End position of SSR in the Chromosome
4	Repeat Class	Class of repeat as grouped by their cyclical variations
5	Repeat Length	Total length of identified repeat in nt
6	Motif count	Number of complete motifs in the STR
7	Purity	Purity of STR region (perfect STR = 1)
7	Repeat Strand	Strand of SSR based on their cyclical variation
8	CIGAR	Representing type of imperfections.

-m or --min-motif-length

The minimum length of the motif of the repeats to be identified.

-M or --max-motif-length

The maximum length of the motif of the repeats to be identified.

-p or --purity

TEXT

Bed file output example

Chromosome	Start	End	Motif	Motif Size	Location Size	Purity	Strand	CIGAR
Test_Seq	90196	90393	AC	2	197	0.949495	+	3=1X3=1X5=1D82=1X17=1X19=1X31=1I2=1X3=1X21=1I2=
Test_Seq	137451	137470	CCCGCT	6	19	1	+	19=
Test_Seq	136254	136401	GT	2	147	0.912752	+	6=1X9=1D20=1D15=1X12=1X5=1X25=1X9=1X7=1X5=1X9=1X10=1X2=1X2=
Test_Seq	139286	139306	AGTTGCTT	8	20	0.95	+	8=1X11=
Test_Seq	3538110	3538168	AATAGCAAGAGCCAGAGCTAGAGCAAAG	8	58	0.881356	+	4=1X1=2I30=1X9=1X5=1X1=1D2=
Test_Seq	4197438	4197487	CACAGCCAGCT	11	49	0.959184	+	26=1X12=1X9=
Test_Seq	4858037	4858050	CTCTTT	6	13	0.923077	+	6=1I6=
Test_Seq	5000704	5000745	TATTCGTATGCGTATTC	17	41	0.902439	+	4=1I22=1X4=2X7=

Citation

Please cite as follows :

Ribbit: Accurate identification and annotation of imperfect tandem repeat sequences in genomes

Akshay Kumar Avvaru, Anukrati Sharma, Divya Tej Sowpati Journal: doi:

Contact

For queries or suggestions, please contact: Akshay Kumar Avvaru - avvaru@ccmb.res.in Divya Tej Sowpati - tej@ccmb.res.in

SowpatiLab/ribbit