seqan/seqan3

Alphabet conversion is not properly constrained

rrahn opened this issue · 5 comments

Does this problem persist on the current main?

  • I have verified the issue on the current main

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Explicit casting of a bitpacked proxy type to its actual value type causes a compile time error.
See #3264 for more details.

Expected Behavior

It compiles.

Steps To Reproduce

#include <vector>

#include <seqan3/alphabet/container/bitpacked_sequence.hpp>
#include <seqan3/alphabet/nucleotide/dna4.hpp>

using namespace seqan3::literals;

seqan3::bitpacked_sequence<seqan3::dna4> source{"ACGGTCAGGTTC"_dna4};
auto it = source.begin();
seqan3::dna4 val = seqan3::dna4{*it}; // compile error - due to explicit conversion

Environment

- Operating system: macOS 14.5
- SeqAn version: main
- Compiler: gcc 13.3

Anything else?

One problem is this code:

template <alphabet in_t, alphabet out_t>
constexpr std::array<out_t, alphabet_size<in_t>> convert_through_char_representation
{
[]() constexpr {
std::array<out_t, alphabet_size<in_t>> ret{};
// for (decltype(alphabet_size<in_t>) i = 0; ...) causes indefinite compilation :(
for (auto i = decltype(alphabet_size<in_t>){0}; i < alphabet_size<in_t>; ++i)
assign_char_to(to_char(assign_rank_to(i, in_t{})), ret[i]);
return ret;
}()
};

Here, we only require in_t to be an alphabet type but this does not include that in_t is also default constructible. However, inside of the initialization of the conversion table the default constructor of in_t is called, causing the seen compile error.

To fix this the in_t must be constrained with std::default_initializable as well.

diff --git a/include/seqan3/alphabet/detail/convert.hpp b/include/seqan3/alphabet/detail/convert.hpp
index c3ebda2a5..8405d6a9a 100644
--- a/include/seqan3/alphabet/detail/convert.hpp
+++ b/include/seqan3/alphabet/detail/convert.hpp
@@ -30,6 +30,7 @@ namespace seqan3::detail
  * \hideinitializer
  */
 template <alphabet in_t, alphabet out_t>
+    requires std::default_initializable<in_t>
 constexpr std::array<out_t, alphabet_size<in_t>> convert_through_char_representation
 {
     []() constexpr {
diff --git a/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp b/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp
index f1c7bb054..5f97b42bc 100644
--- a/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp
+++ b/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp
@@ -77,6 +77,7 @@ public:
     template <typename other_nucl_type>
         requires (!std::same_as<nucleotide_base, other_nucl_type>)
               && (!std::same_as<derived_type, other_nucl_type>) && nucleotide_alphabet<other_nucl_type>
+              && std::default_initializable<other_nucl_type>
     explicit constexpr nucleotide_base(other_nucl_type const & other) noexcept
     {
         static_cast<derived_type &>(*this) =

should work.

Unfortunately, it's not enough to constrain the array (actually, we don't have to), we also have to constrain the ctor.

We probably have this ctor in multiple alphabets...
Maybe there is a way to construct the array if in_t is not default initializable?

Without confirming via debugging, I figure that this is the implicit conversion:

constexpr operator alphabet_type() const noexcept
{
if constexpr (std::is_class_v<alphabet_type>)
return *this;
else
return assign_rank_to(base_t::to_rank(), alphabet_type{});
/* Instead of static_cast'ing to the alphabet_type which also considers the constructors of the alphabet_type,
* we explicitly invoke this operator in various places.
* This prevents errors associated with using alphabet_type's constructors.
*
* This is one of error cases:
* The tuple composite seqan3::qualified returns a component_proxy which inherits from alphabet_proxy_base.
* The qualified alphabet itself inherits from phred_base.
* Now when accessing get<1>(seq_qual_alph) we want to call to_phred at some point because we want the quality,
* therefore the to_phred function from alphabet_proxy is called, but this function did a static_cast to the
* derived type which is calling the constructor from phred_base. Unfortunately now, the generic phred_base
* constructor uses `assign_phred_to(to_phred(other), static_cast<derived_type &>(*this))`; (here) which again
* tries to call to_phred of the alphabet_proxy => infinite loop :boom:
*/
}
//!\brief Implicit conversion to types that the emulated type is convertible to.
template <typename other_t>
requires (!std::is_class_v<alphabet_type>) && std::convertible_to<alphabet_type, other_t>
constexpr operator other_t() const noexcept
{
return operator alphabet_type();
}

seqan::bitpacked_sequence::reference_proxy_type inherits from alphabet_proxy.

template <typename other_nucl_type>
requires (!std::same_as<nucleotide_base, other_nucl_type>)
&& (!std::same_as<derived_type, other_nucl_type>) && nucleotide_alphabet<other_nucl_type>
explicit constexpr nucleotide_base(other_nucl_type const & other) noexcept
{
static_cast<derived_type &>(*this) =
detail::convert_through_char_representation<other_nucl_type, derived_type>[seqan3::to_rank(other)];
}

is the explicit ctor.

By constraining this one, we fall back to implicit conversion (Unless I'm mixing up conversion/casting rules).

I can confirm these findings. And correctly constraining the explicit constructor should do it. To fix this, I am planning to add a detail concept called convertable_through_char_representation that should be used for these constructors.

Something like that would also work, though it's not that nice...

diff --git a/include/seqan3/alphabet/detail/alphabet_proxy.hpp b/include/seqan3/alphabet/detail/alphabet_proxy.hpp
index 7be8044a8..953a59cd4 100644
--- a/include/seqan3/alphabet/detail/alphabet_proxy.hpp
+++ b/include/seqan3/alphabet/detail/alphabet_proxy.hpp
@@ -133,6 +133,8 @@ public:
     //!\brief The alphabet size.
     static constexpr auto alphabet_size = seqan3::alphabet_size<alphabet_type>;
 
+    using alphabet_t = alphabet_type;
+
     /*!\name Write functions
      * \brief All of these call the emulated type's write functions and then delegate to
      *        the assignment operator which invokes derived behaviour.
diff --git a/include/seqan3/alphabet/detail/convert.hpp b/include/seqan3/alphabet/detail/convert.hpp
index c3ebda2a5..0c172be14 100644
--- a/include/seqan3/alphabet/detail/convert.hpp
+++ b/include/seqan3/alphabet/detail/convert.hpp
@@ -30,18 +30,19 @@ namespace seqan3::detail
  * \hideinitializer
  */
 template <alphabet in_t, alphabet out_t>
-constexpr std::array<out_t, alphabet_size<in_t>> convert_through_char_representation
-{
-    []() constexpr {
+constexpr std::array<out_t, alphabet_size<in_t>> convert_through_char_representation{
+    []() constexpr
+    {
         std::array<out_t, alphabet_size<in_t>> ret{};
 
-        // for (decltype(alphabet_size<in_t>) i = 0; ...) causes indefinite compilation :(
-        for (auto i = decltype(alphabet_size<in_t>){0}; i < alphabet_size<in_t>; ++i)
-            assign_char_to(to_char(assign_rank_to(i, in_t{})), ret[i]);
+        static_assert(std::default_initializable<in_t> || requires { typename in_t::alphabet_t; });
+        using value_t = std::conditional_t<std::default_initializable<in_t>, in_t, typename in_t::alphabet_t>;
+
+        for (auto i = decltype(alphabet_size<value_t>){0}; i < alphabet_size<value_t>; ++i)
+            assign_char_to(to_char(assign_rank_to(i, value_t{})), ret[i]);
 
         return ret;
-    }()
-};
+    }()};
 // clang-format on
 
 } // namespace seqan3::detail

Sure, that seems valid. However, it feels too specific to seqan3. In general, any third party alphabet proxy should work as well but now we need to constraint that the proxy type exposes the alphabet_t type for which in_t is a proxy for. And then we would also need to check whether that type is also default_initializable, since alphabets to not require this.