[Alignment] Is there a way to run align_pairwise without traceback?
leannmlindsey opened this issue · 3 comments
Platform
- SeqAn version:
- Operating system:
- Compiler:
Question
In earlier versions of SeqAn there was a function called localAlignmentScore (https://docs.seqan.de/seqan/master/?p=locallAlignmentScore) which did not run the quadratic time alignment step. I have not been able to find the same function in SeqAn3. Is there a different way to do this in SeqAn3?
Hi @leannmlindsey,
sorry for the delay! It is summer vacation time :)
We have a really detailed page (https://docs.seqan.de/seqan/3.2.0/group__alignment__pairwise.html) that describes all the possible combinations that you can do with the pairwise alignment module.
For example, in the section "Accessing the alignment results" we have the following snippet that describes how to limit the output that should be computed (in this example for the edit distance):
#include <vector>
#include <seqan3/alignment/configuration/align_config_edit.hpp>
#include <seqan3/alignment/pairwise/align_pairwise.hpp>
#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/core/debug_stream.hpp>
int main()
{
using namespace seqan3::literals;
// Basic alignment algorithm configuration.
auto config = seqan3::align_cfg::method_global{} | seqan3::align_cfg::edit_scheme;
std::pair p{"ACGTAGC"_dna4, "AGTACGACG"_dna4};
// Compute only the score:
for (auto res : seqan3::align_pairwise(p, config | seqan3::align_cfg::output_score{}))
seqan3::debug_stream << res << "\n"; // prints: {score: -4}
// Compute only the alignment:
for (auto res : seqan3::align_pairwise(p, config | seqan3::align_cfg::output_alignment{}))
seqan3::debug_stream << res << "\n"; // prints: {alignment: (ACGTA-G-C-,A-GTACGACG)}
// Compute the score and the alignment:
for (auto res :
seqan3::align_pairwise(p, config | seqan3::align_cfg::output_score{} | seqan3::align_cfg::output_alignment{}))
seqan3::debug_stream << res << "\n"; // prints: {score: -4, alignment: (ACGTA-G-C-,A-GTACGACG)}
// By default compute everything:
for (auto res : seqan3::align_pairwise(p, config))
seqan3::debug_stream
<< res << "\n"; // prints {id: 0, score: -4, begin: (0,0), end: (7,9) alignment: (ACGTA-G-C-,A-GTACGACG)}
}
As you can see, we can "combine" different traits by piping them together.
Other examples:
// global pairwise alignment with edit distance (i.e. match = 0, mismatch = -1, gaps = -1) (uses a specialized algorithm https://doi.org/10.1145/316542.316550)
// and compute only the score (needs only linear space)
auto config1 =
seqan3::align_cfg::method_global{} |
seqan3::align_cfg::edit_scheme |
seqan3::align_cfg::output_score{};
// banded-local pairwise-alignment with custom affine-scoring scheme (match = +3, mismatch = -2, gap_open = -5, gap_extension = -2) (needs linear time)
// and only compute the score. (needs linear space)
auto config2 =
seqan3::align_cfg::method_local{} |
seqan3::align_cfg::scoring_scheme
{
seqan3::nucleotide_scoring_scheme
{
seqan3::match_score{3},
seqan3::mismatch_score{-2}
}
} |
seqan3::align_cfg::gap_cost_affine
{
seqan3::align_cfg::open_score{-5},
seqan3::align_cfg::extension_score{-2}
} |
seqan3::align_cfg::band_fixed_size
{
seqan3::align_cfg::lower_diagonal{-4},
seqan3::align_cfg::upper_diagonal{4}
} |
seqan3::align_cfg::output_score{}
// would compute the alignment, but would need more space to store it.
// | seqan3::align_cfg::output_alignment{}
;
I hope this helps answering your question. As you are a new user, maybe you can help us improve our documentation. Where did you try to find this information and what could we improve from your POV?
Hi @leannmlindsey, thanks for writing us.
I think I can help you but I'd like to ask you more questions about your request in order to fully understand your request.
You want to use the Smith-Waterman algorithm to compute a pure local alignment where the optimal alignment aligns any infix of the source with any infix of the target sequence, such as finding a conserved region between two protein sequences?
You are not referring to a semi-global alignment where instead the optimal alignment aligns any infix of the source sequence to the entire target sequence, such as in read mapping applications where a potential read mapping location is verified?
I'am asking because I've seen many ambiguous descriptions of the latter problem using the same terminology as for the local alignment.
You asked for a Smith-Waterman variant using reverse scoring? Can you pease elaborate what you mean with reverse scoring in this case?
Is there a particular DP implementation you have in mind?
Because you also referred to some implementation in SeqAn that did not use the quadratic alignment step.
But as far as I know this didn't exist in SeqAn for the local alignment -- only for the global and semi-global alignments.
So the function localAlignmentScore
computes only the score, but uses the regular quadratic runtime DP algorithm as described in the original publication of Smith and Waterman. It does not compute the traceback though, and thus is implemented in linear space.
But you were explicitly referring to runtime, correct?
The same happens in SeqAn3. If you want to compute only the score you can specify the output configuration seqan3::align_cfg::output_score{}
, which will omit the allocation of a quadratic trace matrix internally.
By specifying seqan3::align_cfg::output_score{} | seqan3::align_cfg::output_alignment
you additionally get the final alignment, which requires the allocation of the quadratic trace matrix.
Note the configuration in SeqAn3 chooses automatically the best implementation available to compute the requested alignment features.