zeek/spicy

Inefficient code generation for named tuple returns

Closed this issue · 1 comments

The following "mktuple()" functions produce very different code. Notice the repeated and unproductive make_tuple(...) invocations for each of the offsets for the named variants. Large named tuples cause unnecessary compile and runtime overhead. With bytes elements, this likely involves many unnecessary memory allocations and memcpy calls.


module Test;

function named_mktuple(p: point): tuple<x: uint8, y: uint8, z: uint8> {
  return (p.x, p.y, p.z);
}

function unnamed_mktuple(p: point): tuple<uint8, uint8, uint8> {
  return (p.x, p.y, p.z);
}

public type point = unit {
  x: uint8;
  y: uint8;
  z: uint8;

  on %done {
    print named_mktuple(self);
    print unnamed_mktuple(self);
  }
};
static auto __hlt::Test::unnamed_mktuple(const ::hilti::rt::ValueReference<__hlt::Test::point>& p)
    -> std::tuple<::hilti::rt::integer::safe<uint8_t>, ::hilti::rt::integer::safe<uint8_t>,
                  ::hilti::rt::integer::safe<uint8_t>> {
    ::hilti::rt::detail::checkStack();
    __location__("tuple-return.spicy:8:3");
    return std::make_tuple(::hilti::rt::optional::value((*p).x), ::hilti::rt::optional::value((*p).y),
                           ::hilti::rt::optional::value((*p).z));
}
static auto __hlt::Test::named_mktuple(const ::hilti::rt::ValueReference<__hlt::Test::point>& p)
    -> std::tuple<::hilti::rt::integer::safe<uint8_t>, ::hilti::rt::integer::safe<uint8_t>,
                  ::hilti::rt::integer::safe<uint8_t>> {
    ::hilti::rt::detail::checkStack();
    __location__("tuple-return.spicy:4:3");
    return std::make_tuple(std::get<0>(std::make_tuple(::hilti::rt::optional::value((*p).x),
                                                       ::hilti::rt::optional::value((*p).y),
                                                       ::hilti::rt::optional::value((*p).z))),
                           std::get<1>(std::make_tuple(::hilti::rt::optional::value((*p).x),
                                                       ::hilti::rt::optional::value((*p).y),
                                                       ::hilti::rt::optional::value((*p).z))),
                           std::get<2>(std::make_tuple(::hilti::rt::optional::value((*p).x),
                                                       ::hilti::rt::optional::value((*p).y),
                                                       ::hilti::rt::optional::value((*p).z))));
}

Thanks for the ticket! This is due to the codegen currently not handling temporary tuple efficiently (which for function return values is probably a main use case),

for ( auto i = 0U; i < src.elements().size(); i++ )
exprs.push_back(
cg->coerce(fmt("std::get<%d>(%s)", i, expr), src.elements()[i].type(), t->elements()[i].type()));
return fmt("std::make_tuple(%s)", util::join(exprs, ", "));

As a workaround one could declare a local variable with the tuple with individual fields casted to the correct type and return that. This would avoid the overhead of creating each tuple element from a full tuple and instead create it from a single value.