wf_aligner->cigar->operations is blank
lpipes opened this issue · 12 comments
I tried to align two sequences:
GGAGCATCAGTAGACCTAACCATTTTCTCCCTCCATCTAGCAGGAGTATCTTCAATCCTAGGAGCAATTAATTTCATCACTACAATTATTAACATAAA
CCTCAGAAGAAACGCTGCTCACGCCGGGTCTTCAGTTGATTTTGCCATTTTTTCTTTGCACTTGGCGGGAGTTTCTTCTCTATTAGGGGCCGTCAATTTTATTAGAACTTTAAGCAATTTGCGAATTTTTGGGATATTTTTGGATCAGATGCCCCTTTTTTGTTGAGCAGTATTAATTACGGCAGTTCTATTGCTTTTATCCTTACCCGTTCTCGCGGGGGCTATTACCATGTTGCTAACAGACCGCAATTTGAATTCCAGATTTTATGACGTTAGGGGAGGGGGGGATCCCGTGCTCTACCAGCATTTATTT
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.distance_metric = gap_affine;
attributes.affine_penalties.mismatch =4;
attributes.affine_penalties.gap_opening = 6;
attributes.affine_penalties.gap_extension = 2;
// Initialize Wavefront Aligner
wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
wavefront_align(wf_aligner,seq1,strlen(seq1),seq2,strlen(seq2)); //Align
But the cigar->operations
doesn't contain anything. Isn't this supposed to contain the cigar string?
(gdb) p wf_aligner->cigar->begin_offset
$38 = 3686
(gdb) p wf_aligner->cigar->end_offset
$39 = 3999
(gdb) p wf_aligner->cigar->operations
$40 = 0x7ffff28e5b98 ""
Hi,
Ok. The point is that cigar->operations
is a raw buffer storing the CIGAR starting at wf_aligner->cigar->begin_offset
. Try with these simple modifications:
char* pattern = "GGAGCATCAGTAGACCTAACCATTTTCTCCCTCCATCTAGCAGGAGTATCTTCAATCCTAGGAGCAATTAATTTCATCACTACAATTATTAACATAAA";
char* text = "CCTCAGAAGAAACGCTGCTCACGCCGGGTCTTCAGTTGATTTTGCCATTTTTTCTTTGCACTTGGCGGGAGTTTCTTCTCTATTAGGGGCCGTCAATT"
"TTATTAGAACTTTAAGCAATTTGCGAATTTTTGGGATATTTTTGGATCAGATGCCCCTTTTTTGTTGAGCAGTATTAATTACGGCAGTTCTATTGCTT"
"TTATCCTTACCCGTTCTCGCGGGGGCTATTACCATGTTGCTAACAGACCGCAATTTGAATTCCAGATTTTATGACGTTAGGGGAGGGGGGGATCCCGT"
"GCTCTACCAGCATTTATTT";
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.distance_metric = gap_affine;
attributes.affine_penalties.mismatch = 4;
attributes.affine_penalties.gap_opening = 6;
attributes.affine_penalties.gap_extension = 2;
// Initialize Wavefront Aligner
wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
int status = wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text)); //Align
fprintf(stderr,"Status = %d\n",status);
// Display result
fprintf(stderr,"Offsets = [%d,%d)\n",wf_aligner->cigar->begin_offset,wf_aligner->cigar->end_offset);
fprintf(stderr,"CIGAR = %.*s\n",
wf_aligner->cigar->end_offset-wf_aligner->cigar->begin_offset,
wf_aligner->cigar->operations+wf_aligner->cigar->begin_offset);
Ok thanks for the explanation! I am getting this to work now. However, I ran into another issue with cigar_print_pretty
with a different set of sequences. My pattern is:
GCTAGCAGGTAACTTAGCCCATGCCGGAGCATCAGTAGATCTAGCTATTTTTTCCCCTACACCTAGGCAGGAGTCTCTTCAATTCTAGGAGCAATCAACTTCATCACAACTATTATTAACATGAAACCACCAGGAATATCACAAGACCAAATCCCCCTATTTGTATGATCTGTATTAATTACAGCAGTTCCCTACTACTCCTATCACTACCTGTCTTAGCAGGGGCCATCACAATGCTACTAACAGATCGTAACCTAAATACATCCTTTTTCGACCCAGCAGGAGGAGGAGACGACCCCCAATTCTATATCAACACTTTTTTTC
My text is:
CCTTAGGGGAAGAATCTTTCATTCTTCCCCTGCGGTCGATATTTCCATATTTTGTTATGAAAATATCGGCACCGGGTCCGTTATTTTAGCTATTAATATAGTGGTAACTATTATTAATATACGTTCACCACACATAACTCTTAGACGAATATCATTATTTCCATGAAGCCTATTGGTCGCCAGGATCCTTCTTCTACTGGCGATGCCGGTACTCATAGGGGCTATCTCAATGTTGGTACTGGATCGAAATTTCGACACTAGATTTTTCGATCCAACGGGGGGCGGAGACCCCATCTTGTACCAACACTTATTCTGATTCTTCGGGCATCCTGAGGTCTACATTCTGATTTTGCCCGGATTTGGTATCATCTCGCATATTATTGCCCACTACTCCGCAAAGCCGACAGTTTTTGGTTCGCTGGGTATAATTTATGCTATATTAGGAATTGGGGTACTAGGTTTCGTTGTCTGGGCTCACCATATGTTTACGGTTGGCATGGATATCGACACCCGTGCTTATTTCACCGCAGCTACTATAATTATCGCAGTCCCGACAGGCATTAAGATTTTCAGTTGATTAGCCACTATTTGAGGAACTAATATTCATTACGAGACGCCAATACTCTGGGCCTTAGGGTTTATTTTCATATTTACCTTAGGAGGGTTAACAGGGATCGTTCTTGCTAATTCGTCTTTAGACATCGTCCTCCATGATACGTTCTATGTCGTGGCTCATTTTCATTACGCTCTTTCTATGGGGACGGTTTTCTCTATTTTTGGGGGTGTTAATTACTGGTTCCCAGTGATAACCGGGCTCACCCTCCATAGTCGGTGAACGAAGATCCACTTCTTTATTACATTTATTGGGGTTAATTTAACCTTCTTCCCCCAACACTTTTTAGGGTTAGCAGGCATACCTCGCCGGTATTCCGATTATCCCGACGCTTATGCGATGTGAAATTCCCTTTCTTCAATTGGTTCAATAATTTCTTTTGTAGGGCTATTGTTTTTCATCTTTATTTTGTGGGAAAGTTTTGCTGCCCAGCGTGGGGTTATTAGTCCTACCGCTCAAGTAGCTTCAGCAGAGTGAGGGTTCATGCCCTTGAACCCAATGACTGAGTTGGAGCCTATTTCTTTTACTCGTTCAACCCCTTTCTTCGCCTTTATAACTCCTTAATGACACAATAGATATAAGAAATTTAGAGACCAGACTACACTGCAAAATGTAACACTTTTATTGTTTCTCTTACCCCCTAACCACAAAGCTTATACCTGTTACTCTGTAAATATACAAGATGCATCTCCACCTTCTCCCCGTTATACCTAGCCGTGTGTAGTTTCCTAGTGTCTAAATTTCCCCATAACCATATTAATATATACGTTGTTTCCTTAGGCCCTAATATTATATATTTGCACTACAATTACTTCCGGCTTTACGTTAATATGGTATTCACCTCTTTTGTGTAATGACTAAATTTGCGGTTCATATTTTGGTTTATGATGTCCTTACGCCCCTTATACTATAAGTTGTAACATTCAGAGCTTGATTACATGTCCTTAAATTTAACACTCTTACTTATGTTATAGTTTGAGAGAAATAGTCTAGTAGCACCTCCTTTAACCGCTTTTATTGAGGCCGCTGGCTATTTTCTACTAATCATAAAGATATTGGGACCTTATATTTTATTTTAGGTACTTGGGCAGGCCTCCTTGGGACCTCAATAAGATTACTTATTCGAGCTGAACTAGGACAACCAGGGTCTCTACTAGGTAGAGACCAGCTGTATAATACTATCGTAACTGCCCACGCCTTTTTAATAATTTTTTTCTTAGTTATGCCTATTATAATTGGGGGATTTGGTAACTGGTTAATCCCCCTTATGCTTGGGGCTCCAGACATAGCCTTTCCACGGTTAAACAATATGAGGTTTTGATTACTGCCCCCTTCATTGTTTCTACTTGTAATAAGGTCAGTAGTTGAGAAAGGGGTAGGCACTGGGTGAACAGTTTATCCTCCTCTATCTGGAAACATTGCACACGCCGGAGCCTCAGTTGACTTAGCGATTTTCAGTCTTCATCTAGCTGGAGTGTCCTCGATTTTAGGGGCCCTAAACTTTATTACTACGGTTATCAACATACGGCCAAATGCTCTACGGCTGGAACGAGTCCCTCTGTTCCCTTGAGCAGTTAAAATCACCGCTATTCTACTGTTGCTATCCCTACCGGTCTTAGCTGGAGCTATCACTATGCTTTTAACAGATCGTAACTTGAACACCGCTTTTTTCGACCCTGGAGGTGGTGGTGACCCTATTTTATATGAACACTTATTC
But when I use cigar_print_pretty
the pattern_pos
becomes longer than the length of the pattern and I get a Segfault.
446 if (pattern[pattern_pos] != text[text_pos]) {
(gdb) p pattern_pos
$68 = 480
(gdb) p text_pos
$69 = 385
(gdb) p pattern[pattern_pos]
$70 = 0 '\000'
(gdb) p (int)strlen(pattern)
$73 = 324
(gdb) p (int)strlen(text)
$74 = 2330
Ok, but I will need the code snippet you are using to tell you more.
AFAIK pattern_pos
is never provided to the library, so I don't know how can it change.
Thanks,
I was able to figure it out. It was user error sorry. I have another question. With Glocal alignment, what should the values for alignment_endsfree
, text_begin_free
, and text_end_free
be set to? Thank you btw for adding this in from the previous WFA version.
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.alignment_form.span = alignment_endsfree;
attributes.alignment_form.pattern_begin_free = 0;
attributes.alignment_form.pattern_end_free = 0;
attributes.alignment_form.text_begin_free = text_begin_free;
attributes.alignment_form.text_end_free = text_end_free;
I sort of understand. But I haven't had much luck with adjusting the parameters. For example, I have the parameters set to:
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.distance_metric = gap_affine;
attributes.affine_penalties.mismatch =4;
attributes.affine_penalties.gap_opening = 6;
attributes.affine_penalties.gap_extension = 2;
attributes.alignment_form.span = alignment_endsfree;
attributes.alignment_form.pattern_begin_free = 0;
attributes.alignment_form.pattern_end_free = 0;
But I still get this kind of alignment, where the beginning A
is aligned to the first base of the text/reference.
Ok, but you are not defining the text_end/begin, and for the pattern, you are setting both to zero (i.e., disabled). Try setting some leeway, like:
attributes.alignment_form.pattern_begin_free = 50;
attributes.alignment_form.pattern_end_free = 50;
Hi,
If you can paste any of those ASCII sequences here, I can provide an example of how to use it.
>KM404175.1_3_2
ATTTGTGTGATCCGTACTTATCACTGCCGTCCTACTACTCCTATCAATTCCAGTTCTCGCTGCTGGTATCACCATACTACTAACAGACCGAAAACTAAATACCACATTCTTCGACCCCGCTGGAGGAGGCGACCCAGGCCTATACCAACA
>KM001333.1
AACCTAGCTGGCAACCTAGCTCATGCTGGAGCTTCAGTAGACCTAGCTATTTTCTCCCTTCACGTAGCAGGTGTATCCTCCATCCTAGGTGCAATCAACTTCATCACAACTGCCATCAACATAAAACCACCCGCCCTCTCACAATACCAAACACCTTTATTTGTATGATCCGTACTTATCACTGCTGTCTTACTGCTTCTATCACTTCCAGTTCTCGCCGCTGGCATCACCATACTACTAACAGACCGAAATCTAAACACCACATTCTTTGATCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Hi,
I'm also getting this type of alignment (see image) with these sequences. Can you suggest different parameters that I can try to get a better Glocal alignment?
>DQ433359.1
CCTAGCTGGTAACCTAGCCCATGCTGGAGCTTCAGTGGACCTAGCAATCTTCTCACTCCACCTAGCAGGTGTGTCCTCTATCTTAGGCGCTATCAACTTCATCACAACTGCTATCAACATAAAACCCCCAGCCCTATCACAATACCAAACTCCCTTATTCGTGTGATCAGTACTCATCACTGCCGTCCTACTACTTCTCTCACTTCCCGTGCTCGCTGCTGGCATCACTATACTACTAANAGACCGAAACCTAAACACAACATTCTTTNATCCAGCCGGAGGCNNNNACCCAGTACTATACCAGCACCTTTTCA
>GU571333.1_53
CACTTCACTTAGCCGGTGTCTCTTCTATCCTAGGCGCAATTAACTTTATCACAACAGCTATTAATATAAAACCACCTGCCCTATCACAATACCAAACTCCCCTATTCGTGTGATCAGTGCTCATCACTGCCGTCCTACTACTCCTATCAC
The parameters I used were:
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.distance_metric = gap_affine;
attributes.affine_penalties.mismatch =4;
attributes.affine_penalties.gap_opening = 6;
attributes.affine_penalties.gap_extension = 2;
attributes.alignment_form.span = alignment_endsfree;
attributes.alignment_form.pattern_begin_free = 50;
attributes.alignment_form.pattern_end_free = 50;
I'm sorry for not getting back to you sooner. I have been playing with the example, and I can make a suggestion:
(1) Use a match score different from 0.
(2) Use full ends-free.
Using the following code:
int main(int argc,char* argv[]) {
char* pattern =
"CACTTCACTTAGCCGGTGTCTCTTCTATCCTAGGCGCAATTAACTTTATCACAACAGCTATTAATATAAAACCACCTGCCCTATCACAATAC"
"CAAACTCCCCTATTCGTGTGATCAGTGCTCATCACTGCCGTCCTACTACTCCTATCAC";
char* text =
"CCTAGCTGGTAACCTAGCCCATGCTGGAGCTTCAGTGGACCTAGCAATCTTCTCACTCCACCTAGCAGGTGTGTCCTCTATCTTAGGCGCTA"
"TCAACTTCATCACAACTGCTATCAACATAAAACCCCCAGCCCTATCACAATACCAAACTCCCTTATTCGTGTGATCAGTACTCATCACTGCC"
"GTCCTACTACTTCTCTCACTTCCCGTGCTCGCTGCTGGCATCACTATACTACTAANAGACCGAAACCTAAACACAACATTCTTTNATCCAGC"
"CGGAGGCNNNNACCCAGTACTATACCAGCACCTTTTCA";
// Configure alignment attributes
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.distance_metric = gap_affine;
attributes.affine_penalties.match = -1;
attributes.affine_penalties.mismatch = 4;
attributes.affine_penalties.gap_opening = 6;
attributes.affine_penalties.gap_extension = 2;
attributes.alignment_form.span = alignment_end2end;
// Initialize Wavefront Aligner
wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
// Align
wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text));
fprintf(stderr,"WFA-Alignment returns score %d\n",wf_aligner->cigar->score);
// Display alignment
fprintf(stderr," PATTERN %s\n",pattern);
fprintf(stderr," TEXT %s\n",text);
fprintf(stderr," SCORE (RE)COMPUTED %d\n",cigar_score_gap_affine(wf_aligner->cigar,&attributes.affine_penalties));
cigar_print_pretty(stderr,wf_aligner->cigar,pattern,strlen(pattern),text,strlen(text));
// Free
wavefront_aligner_delete(wf_aligner);
}
Suggests that the alignment is found in the middle of the reference (semi-global or glocal).
PATTERN C-----------------------------------------------------ACTTCACTTAGCCGGTGTCTCTTCTATCCTAGGCGCAATTAACTTTATCACAACAGCTATTAATATAAAACCACCTGCCCTATCACAATACCAAACTCCCCTATTCGTGTGATCAGTGCTCATCACTGCCGTCCTACTACTCCTATCAC---------------------------------------------------------------------------------------------------------------
| ||| ||| |||| ||||| || |||||| ||||||| || ||||| |||||||| ||||| || |||||||| || |||||||||||||||||||||||| |||||||||||||||| ||||||||||||||||||||||| || ||||
TEXT CCTAGCTGGTAACCTAGCCCATGCTGGAGCTTCAGTGGACCTAGCAATCTTCTCACTCCACCTAGCAGGTGTGTCCTCTATCTTAGGCGCTATCAACTTCATCACAACTGCTATCAACATAAAACCCCCAGCCCTATCACAATACCAAACTCCCTTATTCGTGTGATCAGTACTCATCACTGCCGTCCTACTACTTCTCTCACTTCCCGTGCTCGCTGCTGGCATCACTATACTACTAANAGACCGAAACCTAAACACAACATTCTTTNATCCAGCCGGAGGCNNNNACCCAGTACTATACCAGCACCTTTTCA
In that case, I would suggest using ends-free
, so that:
// Configure alignment attributes
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.distance_metric = gap_affine;
attributes.affine_penalties.match = -1;
attributes.affine_penalties.mismatch = 4;
attributes.affine_penalties.gap_opening = 6;
attributes.affine_penalties.gap_extension = 2;
attributes.alignment_form.span = alignment_endsfree;
attributes.alignment_form.pattern_begin_free = 0;
attributes.alignment_form.pattern_end_free = 0;
attributes.alignment_form.text_begin_free = strlen(text);
attributes.alignment_form.text_end_free = strlen(text);
Gives the following:
PATTERN -----------------------------------------------------CACTTCACTTAGCCGGTGTCTCTTCTATCCTAGGCGCAATTAACTTTATCACAACAGCTATTAATATAAAACCACCTGCCCTATCACAATACCAAACTCCCCTATTCGTGTGATCAGTGCTCATCACTGCCGTCCTACTACTCCTATCAC---------------------------------------------------------------------------------------------------------------
|||| ||| |||| ||||| || |||||| ||||||| || ||||| |||||||| ||||| || |||||||| || |||||||||||||||||||||||| |||||||||||||||| ||||||||||||||||||||||| || ||||
TEXT CCTAGCTGGTAACCTAGCCCATGCTGGAGCTTCAGTGGACCTAGCAATCTTCTCACTCCACCTAGCAGGTGTGTCCTCTATCTTAGGCGCTATCAACTTCATCACAACTGCTATCAACATAAAACCCCCAGCCCTATCACAATACCAAACTCCCTTATTCGTGTGATCAGTACTCATCACTGCCGTCCTACTACTTCTCTCACTTCCCGTGCTCGCTGCTGGCATCACTATACTACTAANAGACCGAAACCTAAACACAACATTCTTTNATCCAGCCGGAGGCNNNNACCCAGTACTATACCAGCACCTTTTCA