Publisher<Message>::publish puts message serialization buffer on the stack instead of using the allocators
Closed this issue · 6 comments
For very large messages (i.e. 9KB+) and system with small stacks (i.e. 8KB), this practice can cause Stack problems. The message serialization buffer is used almost immediately in the liz-ards to make frame sequences so it would be a temporary large allocation.
Yup, you are right, for serialization we currently have 3 such places. In contrast, for de-serialization we have SmallPayloadSize
approach (see one single place at tryDeserializePayload
).
@emrainey @pavel-kirienko What do you think guys, should we use similar SmallPayloadSize
threshold in case of serialization? Although we can't say ahead for sure what would be result size of serialized message, but at least we know its maximum size (the Message::_traits_::SerializationBufferSizeBytes
), so we can do something like this:
if (Message::_traits_::SerializationBufferSizeBytes <= SmallPayloadSize)) {
// use current stack-based serialization
} else {
// allocate temp buffer from PMR
}
OR, we can have two different template specializations of Publisher
, Client
and Server
based on compile time <= SmallPayloadSize
condition).
Other alternative could an extra void setSmallPayloadThreshold(std::size_t threshold)
method (or two separate methods for TX and RX in case of RPC client/server), so that it will be decided at runtime (whether to use stack or dynamic memory), and potentially per entity (publisher, subscriber, client or server). Default value of these thresholds would be current SmallPayloadSize
.
Please let me know your thoughts and preferences.
I think the comparison needs to be done at compile time, with specializations, because your runtime if
may still cause stack allocation in the first branch even if it's not taken.
The threshold should be set to a reasonable small value that can be overridden via an optional preprocessor configuration parameter like LIBCYPHAL_CONFIG_STACK_ALLOC_MAX
.
Scott indicated that there was a compromise where you could create a small allocators on the stack then pass it's upstream as the large allocator which is off the stack. If it fit in the small allocator, it would stay on the stack.
... you could create a small allocators on the stack
It's exactly what Pavel wanted to avoid, namely:
... because your runtime if may still cause stack allocation in the first branch even if it's not taken.
I still believe it's possible to do it at runtime, with minimum code duplication. Here is pseudo code:
auto publish(message) -> optional<Failure>
{
constexpr auto MaxSize = Message::_traits_::SerializationBufferSizeBytes;
if (MaxSize <= small_payload_threshold_) {
array<uint8, MaxSize> stack_buffer;
return publishImpl(message, stack_buffer);
} else {
unique_ptr<uint8> pmr_buffer{mr.allocate(MaxSize), pmr_deleter};
return publishImpl(message, {pmr_buffer.get(), MaxSize});
}
}
void setSmallPayloadThreshold(std::size_t threshold) {
small_payload_threshold_ = threshold;
}
private:
auto publishImpl(message, span buffer) -> optional<Failure> {
// serialization
// publishRawData calls
}
size_t small_payload_threshold_{SmallPayloadSize}; // f.e. 256, defined in a libcyphal config file
@pavel-kirienko Pavel, what do you think? Are you still in favor of compile_time/specializations? It's tempting, I agree, but what concerns me is that it will be global to ANY type of message/request/response.
We can try that with a comment explaining that if this code is to cause stack usage issues on an embedded system, then it needs to be refactored to use specializations.