bretonics/CHOMP

Rank off targets

Closed this issue · 3 comments

Get match hits for each CRISPR and rank matches according to how many bp are matched. Not only how many occurrences per target throughout genome but which has the least matching base pairs.

Commit 4d3c31a adds output with:

  • Number of significant hits in entire sequence
  • Number of matching nucleotides per hit

Occurrence in format length : matches

Length = window length (CRISPR sequence size)
Matches = number of nucleotide matches in hit

3591c4f and b7925f0 adds support.

Need to switch ranking priority to identities as primary sorting, then by number of occurrences.

Name    Sequence    Strand  Reverse Occurrences Identities
CRISPR_3    TGTGATCACGTACTATTATGCGG plus    GGCGTATTATCATGCACTAGTGT 3   23,8,8
CRISPR_2    AAAAATTTTCTCTATCTAACGGG minus   GGGCAATCTATCTCTTTTAAAAA 4   23,15,8,8
CRISPR_1    AAAAAATTTTCTCTATCTAACGG minus   GGCAATCTATCTCTTTTAAAAAA 4   23,16,8,8
CRISPR_8    AAAAAAAATTTTCCCTATCGGGG minus   GGGGCTATCCCTTTTAAAAAAAA 2   23,9
CRISPR_9    AAAAAAATTTTCCCTATCGGGGG minus   GGGGGCTATCCCTTTTAAAAAAA 2   23,9
CRISPR_6    CGAAAAAAAATTTTCCCTATCGG minus   GGCTATCCCTTTTAAAAAAAAGC 2   23,9
CRISPR_7    GAAAAAAAATTTTCCCTATCGGG minus   GGGCTATCCCTTTTAAAAAAAAG 2   23,9
CRISPR_4    AAAAATCCCATCGATCTAGCAGG minus   GGACGATCTAGCTACCCTAAAAA 8   23,9,7,7,7,7,7,7
CRISPR_0    ATGTAGCTAGCTAGCTAGTAGGG plus    GGGATGATCGATCGATCGATGTA 5   23,14,12,10,10
CRISPR_5    TCCCATCGATCTAGCAGGCCCGG minus   GGCCCGGACGATCTAGCTACCCT 7   23,15,9,7,7,7,7

Less base pair matches in match hit (identities) == better CRISPR, followed by fewer occurrences.

e512c8f closes