GabrielDosReis/ipr

Clean up type factory (#237) may have cleaned too much

Closed this issue · 12 comments

Commit #237 removed very useful property that allowed the following pattern to work.

// make me some product
ipr::impl::ref_sequence<ipr::Type> types;
types.push_back(<some type>);
types.push_back(<some type>);
return lexicon.get_product(types);

This used to work because lexicon hosted ref_sequences in a util::rb_tree::container<ref_sequence<ipr::Type>> type_seqs; and whenever someone would call get_product, whatever sequence of types was provided was persisted in a type_seqs and a reference to that sequence was stored in impl::Product.

Behold, prior to #237, we were copying incoming ref_sequence in lexicon owned collection.

      const ipr::Product&
      Lexicon::get_product(const ref_sequence<ipr::Type>& s) {
         return *types.make_product(*type_seqs.insert(s, unary_lexicographic_compare()));
      }

After #237 we only have the following:

      const ipr::Product& type_factory::get_product(const ipr::Sequence<ipr::Type>& seq)
      {
         return *products.insert(seq, unary_lexicographic_compare());
      }

I believe that this was unintentional regression and type_seq needs to come back in some form. Maybe the rationale (before #237) was that if you already have a stable sequence, you can call type_factory::make_product and avoid the copy that lexicon::get_product was incurring.

Here is a test that demonstrates the problem:

void test_buglet()
{
    ipr::impl::Lexicon lexicon{};

    union X
    {
        ipr::impl::ref_sequence<ipr::Type> types;
        char mem[sizeof(types)];
        X() : types() {}
        ~X() {}
    };

    X x;
    x.types.push_back(&lexicon.int_type());
    auto& p1 = lexicon.get_product(x.types);

    p1.size();
    puts("first size ok");

    // trash the memory of on stack impl::ref_sequence<ipr::Type>
    std::fill(std::begin(x.mem), std::end(x.mem), 255);

    p1.size(); // BOOM! 
    puts("second size ok");
}

ipr::impl::ref_sequence<ipr::Type> types;

Part of the issue here is in that code: It shouldn't be using impl::ref_sequence outside of <ipr/impl>. Unless the type is allocated by the IPR library through a get_xyz or make_xyz function, you should not use it and expect that a copy was magically kept somewhere else. Said differently, don't reach into the guts of the IPR implementation that way.

Behold, prior to #237, we were copying incoming ref_sequence in lexicon owned collection.

And that was too much copying in many cases where the IPR implementation types are used correctly, and the removal was intentional.

What would be the equivalent of:

// make me some product
ipr::impl::ref_sequence<ipr::Type> types;
types.push_back(<some type>);
types.push_back(<some type>);
return lexicon.get_product(types);

In a brave post #237 world?

There are a few ways to get around this:

  1. Have a factory function that returns an impl::ref_sequence<ipr::Type>; it has the benefit of the IPR controling the lifetime, as for the other nodes; it has the downside of spreading the guts of the library all over the place
  2. Use an intermediary type (e.g. std::vector) to indicate to IPR to make a copy of the arguments. That way, copying is indicated by the API.

I think (1) is horrible; (2) is an acceptable way to move forward. Other options?

We have other nodes that have variable number of children, Block, Class, Expr_list, that allows incremental construction. For example,

auto& expr_list = *ctx.lexicon.make_expr_list();
expr_list.push_back(<expr>)

What makes get_product, get_sum unique (pun intended), that we unify them and thus would like to get all of their types at once and thus making sum and product less suitable for incremental construction.

Option (1) is even more horrible that it does not even truly takes care of the lifetime (compared to pre #237) world.

auto& type_list = *lexicon.make_type_list(); // yes, its lifetime is managed by lexicon
type_list.push_back(<type>); // but, we can mutate it
type_list.push_back(<type>);
auto& product = lexicon.get_product(type_list); // we still have to copy, since type_list is in the users hands
type_list.push_back(<type>); // and can be happily modified again
auto& sum = lexicon.get_sum(type_list);

I do not see much value of option (2) compared to pre #237 and I would say ref_seq is more preferrable to
vector<const Type*> as the latter may give an impression that null types are OK, whereas ref_seq did not give that impression.

Here is option (3), which is pre #237 world in disguise.

struct impl::Lexicon {
   using Type_list = ref_sequence<Type>;

   get_product(const Type_list&);
};

impl::Lexicon::Type_list list;
list.push_back(<some type>);
list.push_back(<some type>);
return lexicon.get_product(list);

This is pretty much the pre #237 status quo that hides the guts of ref_sequence a bit.

What makes get_product, get_sum unique (pun intended), that we unify them and thus would like to get all of their types at once and thus making sum and product less suitable for incremental construction.

Unification is pretty much incompatible with incremental construction since the whole uniqueness is determined by the parts; the parts need to be collected first. In the world before removal, you didn't actually have incremental construction -- you only had an illusion of an incremental construction. The removal removed that illusion; so now whatever uses or needs we have need to accomplished through non-illusion.

We have other nodes that have variable number of children, Block, Class, Expr_list, that allows incremental construction.

Those nodes aren't subject to unification; they are all created by make_xzy functions, not get_xyz functions. So, they are not even similar, let alone comparable to the situation of get_product or get_sum.

I do not see much value of option (2)

It allows distinction, through the API, between when a copy should be made, from when it shouldn't. Making that distinction ensures that we are codifying lifetime points -- which the library failed to do earlier, allowing you to construct the non-supported case that you did earlier. I would like to proceed forward with that basis.

using Type_list = ref_sequence<Type>;

We need a distinct type, unrelated to ref_sequence through identity.

using Type_list = std::vector<const ipr::Type*>;

Then?

using Type_list = std::vector<const ipr::Type*>;

Then?

Either that -- which is equivalent to what I suggested earlier and you didn't like -- or something like

template<typename T>
struct collection : private std::vector<const T*> {
private:
    using Base = std::vector<const T*>;
public:
// ... begin, end, and other supporting machinery.
    void push_back(const T& t) { Base::push_back(&t); }
};

and use collection<ipr::Type> if you feel that there might a confusion about null pointer.
Inherently though, the implementation side will have to manipulate pointers in one form or the other for messiness is unavoidable so we may be talking about degree of messiness. However, if most operations are done through the interface nodes, then you can even have local implementations (compiler-specific implementations of the interface nodes) that still interoperate with the rest of the IPR library.

If you think there will be more uses for collection<T> beyond get_product and get_sum, I would lean towards having a collection. If it is just for get_product / get_sum. I would be OK with vector and revisit the issue if new information arrives.

Looks Ok.

// make me some product
std::vector<const ipr::Type*> types;
types.push_back(&<some type>);
types.push_back(&<some type>);
return lexicon.get_product(types);

or most likely:

// make me some product
Lexicon::Type_list types;
types.push_back(&<some type>);
types.push_back(&<some type>);
return lexicon.get_product(types);

If you think there will be more uses for collection<T> beyond get_product and get_sum

There are two sides to the coin: (1) the number of factory functions like these; (2) the number of uses of a factory function like this.

So even, if we had only one function like this that needed to be used multiple types, it would still make sense to have a good API for it.

After pondering on both aspects, I am leaning towards something like collection<T>, let us call it impl::Wharehouse<T> to emphasize that it is temporary storage, not permanent like Sequence<T>.

After pondering on both aspects, I am leaning towards something like collection<T>, let us call it impl::Wharehouse<T> to emphasize that it is temporary storage, not permanent like Sequence<T>.

An alternative API (suggested by @d-winsor ) is to instead use a form of span, to abstract over the specific collection used to truck data to the factory function.

Fixed by #271