seqan/seqan3

Convert vertor<dna5> to std:string

nhhaidee opened this issue · 1 comments

Platform

  • SeqAn version: 3.2.0
  • Operating system: Ubuntu
  • Compiler: 11.4

Question

With the following code, is there any way to Convert vertor to std:string as I want to handle C++ standard string?

Thanks,
Hai

auto input = R"(>TEST1
ACGT
>Test2
AGGCTGA
>Test3
GGAGTATAATATATATATATATAT)";

int main(int argc, const char *argv[]) {

    using sequence_file_input_type =
            seqan3::sequence_file_input<seqan3::sequence_file_input_default_traits_dna,
                    seqan3::fields<seqan3::field::seq, seqan3::field::id>,
                    seqan3::type_list<seqan3::format_fasta>>;
    sequence_file_input_type fin{std::istringstream{input}, seqan3::format_fasta{}};
    // Retrieve the sequences and ids.
    for (auto &[seq, id]: fin) {
        seqan3::debug_stream << "ID:  " << id << '\n';
        seqan3::debug_stream << "SEQ: " << seq << '\n';
        // a quality field also exists, but is not printed, because we know it's empty for FASTA files.
    }

    return 0;
}

Hi @nhhaidee,

thanks for reaching out!

This is indeed a common use case that is not well handled by our library. The solution is a bit unintuitive:

You can adapt the seqan3::sequence_file_input_default_traits_dna

struct my_traits : seqan3::sequence_file_input_default_traits_dna
{
    using sequence_alphabet = char; // instead of dna5
 
    template <typename alph>
    using sequence_container = std::basic_string<alph>; // must be defined as a template!
};

that will automatically read the sequences as a std::string (std::string = std::basic_string<char>)

Full Solution:

#include <iostream>

#include <seqan3/io/sequence_file/all.hpp>
#include <seqan3/core/debug_stream.hpp>

auto input = R"(>TEST1
ACGT
>Test2
AGGCTGA
>Test3
GGAGTATAATATATATATATATAT)";

struct my_traits : seqan3::sequence_file_input_default_traits_dna
{
    using sequence_alphabet = char; // instead of dna5
 
    template <typename alph>
    using sequence_container = std::basic_string<alph>; // must be defined as a template!
};

int main(int argc, const char *argv[]) {

    using sequence_file_input_type =
            seqan3::sequence_file_input<my_traits,
                    seqan3::fields<seqan3::field::seq, seqan3::field::id>,
                    seqan3::type_list<seqan3::format_fasta>>;

    sequence_file_input_type fin{std::istringstream{input}, seqan3::format_fasta{}};
    // Retrieve the sequences and ids.
    for (auto &[seq, id]: fin) {
        std::cout << "ID:  " << id << '\n';
        std::cout << "SEQ: " << seq << '\n';
        // a quality field also exists, but is not printed, because we know it's empty for FASTA files.
    }

    return 0;
}

working on Compiler Explorer: https://godbolt.org/z/PrrooYzTK

As you can see, the sequence can now also be printed with std::cout since it is a std::string