nasa/cFE

Refactor message header alignment and "raw" types to fit a clear pattern

skliper opened this issue · 8 comments

Is your feature request related to a problem? Please describe.
Aligned version of message headers currently in SB, shows different handling of the base type.

/** \brief Software Bus generic message */
typedef CFE_MSG_Message_t CFE_SB_Msg_t;
/** \brief Aligned Software Bus command header */
typedef union CFE_SB_CmdHdr {
CFE_MSG_CommandHeader_t Cmd;
CFE_SB_Msg_t BaseMsg;
} CFE_SB_CmdHdr_t;
/** \brief Aligned Software Bus telemetry header */
typedef union CFE_SB_TlmHdr {
CFE_MSG_TelemetryHeader_t Tlm;
CFE_SB_Msg_t BaseMsg;
} CFE_SB_TlmHdr_t;

Describe the solution you'd like
See discussion below.

Describe alternatives you've considered
None.

Additional context
Brought up as part of #777/#998 review.

Requester Info
Jacob Hageman - NASA/GSFC (from CCB discussion)

Regarding discussion that came up in today's CCB - this may also affect/determine how we proceed with #689:

This primarily deals with the alignment of messages. The fundamental concern - and what needs to be distinguished here - is what module is actually imposing the alignment requirement.

Messages themselves only need to be aligned sufficiently for the types they actually contain (i.e. a message containing only uint8 bytes needs no special alignment, while a message containing any uint32 value needs 32 bit alignment, and so forth.

However, the software bus implementation (CFE_SB) internally aligns all of its buffers, and it uses the (historical) CFE_SB_Msg_t* pointer type as its interface type for both CFE_SB_SendMsg() and CFE_SB_RcvMsg() as well as the other msg introspection calls.

Therefore -- since the extra alignement is actually driven by CFE_SB and its underlying implementation - my recommendation is to put the unions here. So:

  • The msg module should provide CFE_MSG_Message_t, CFE_MSG_TelemetryHeader_t, and CFE_MSG_CommandHeader_t types. In this case the CFE_MSG_Message_t should be the actual structure type, not a union.

  • The CFE_SB module should provide CFE_SB_Msg_t, and CFE_SB_CmdHdr_t and CFE_SB_TlmHdr_t - which are simply variants of those types provided by MSG but aligned properly for usage by software bus. (that is, they are union-ized here)

  • The "raw" messages in apps should be defined using only types from the MSG module (i.e. in the respective *_msg.h file).

  • But when instantiating a message buffer locally for sending (i.e. in telemetry-generating code) the buffer should be instantiated using CFE_SB_Msg_t as its base type. This way - in the event that the telemetry message has low alignment requirements - it will not generate a warning, because the buffer will be aligned to CFE_SB_Msg_t level - if this is greater than the TLM message itself.

  • On the other hand - When receiving a command - it is OK to cast from CFE_SB_Msg_t* to the actual command type - because this should only be a downgrade in alignment, never an upgrade.

The reason for following this pattern is that the sizeof() operator - when applied to the message types - will work as expected, and won't reflect extra padding that might have been added due to runtime padding for SB.

For a specific case where this distinction matters - consider CFE_ES_Restart_t. This command has a single uint16 payload, and is traditionally expected to be 10 bytes total in length when the CMD header is added.

When sent across the software bus interface - the buffer will actually be 12 bytes or maybe even 16. But that extra padding is only necessary for runtime data alignment - the extra bytes should not be interpreted as part of the message.

If these patterns I'm suggesting above are followed then things should mostly work (I think, or at least as well as can be expected using C structs and not a proper data dictionary implementation.

If you aren't using the aligned headers to define the message, wouldn't we also need to make the raw headers end on the most restrictive boundary, so the payload will always start in the same location (and not break CFE_SB_GetUserData)? For "instantiating a message buffer locally" do you mean do the union between the raw message and CFE_SB_Msg_t within the app? So really going back to the whole message buffer pattern that I took out in nasa/sample_app#104 for example?

typedef union	
{	
    CFE_SB_Msg_t       MsgHdr;	
    SAMPLE_APP_HkTlm_t HkTlm;	
} SAMPLE_APP_HkBuffer_t;

If that's the case, don't CFE_SB_CmdHdr_t and CFE_SB_TlmHdr_t really have no use anymore?

@ejtimmon - sounds like this isn't settled yet...

we also need to make the raw headers end on the most restrictive boundary, so the payload will always start in the same location (and not break CFE_SB_GetUserData)?

If I'm interpreting correctly - you are referring to the possibility of a gap occurring between the header as defined by MSG and the actual start of data (for instance if the TLM header is 12 bytes and the payload contains a double which requires 8 byte alignment).

As you noted in #777 though, the CFE_SB_GetUserData() function is inherently fragile and one should use the actual structure definition for this. (e.g. use something like offsetof(mymessage_t, Payload) or &mymessage->Payload). It is only an issue when you are making assumptions about how the compiler may (or may not) pad something out. (as an aside I'm tempted to say CFE_SB_GetUserData should be deprecated, but in the event that one moves toward a dictionary-based solution, it becomes more possible to implement this reliably).

Pedantically speaking there also no guarantee that using CFE_SB_CmdHdr_t (the union) will avoid implicit padding any better than using CFE_MSG_CommandHeader_t (actual type) does. Because avoiding padding requires an assumption about what the worst-case alignment will be for any payload will be - and accounting for it in the respective struct/union. So - as long as MSG types already account for this by explicitly making CFE_MSG_CommandHeader_t and CFE_MSG_TelemetryHeader_t to be a multiple of 8 bytes instead of 4 - this is probably as good as can be done when using C structs to define messages. Adding a union won't avoid padding any better.

Again, padding avoidance is based on the assumption that the most probable worst case payload alignment will be 8 bytes, and it is therefore unlikely that the compiler will add extra padding in this case. "assumption", "probable" and "unlikely" being the key weasel-words here, because technically any compiler can add padding or extra alignment wherever it sees fit on the particular architecture it is compiling for. (i.e. it is entirely plausible that some FPU out there works better with 16 byte alignment, particularly if you use the long double type). As a C programmer there is no way you can absolutely guarantee that a compiler won't add padding between struct members - the C standard says that this is platform-defined.

My main point is - this can result in an unanticipated implicit padding gaps either way - whether the unionized "CFE_SB_CmdHdr_t" was used or the base structure was used to declare the struct. Compilers can do what they want - developers can only guess at the most likely memory padding, nothing is guaranteed about the layout of bits within a struct that isn't directly specified in the C standard.

Hence why I would prefer to see the struct definitions in the *_msg.h header file(s) convey the intent of what the cmd/tlm packets contain in the simplest terms possible. The distinction is that we are not sending a CFE_SB_Msg_t or CFE_SB_CmdHdr_t or CFE_SB_TlmHdr_t on the network - this exists only for internal use within SB for transport/API purposes. The CFE_MSG_CommandHeader_t or CFE_MSG_TelemetryHeader_t are what is intended to be sent/received, so this is how the msg structure should be defined in an app.

If that's the case, don't CFE_SB_CmdHdr_t and CFE_SB_TlmHdr_t really have no use anymore?

I think so - these seem like surplus types now given the current design. Perhaps we can remove them.

If my memory serves correctly, the main reason for making CFE_SB_Msg_t into a union in the first place was for two reasons:

  • Provide a modicum of type safety for messages - Having a non void pointer base type means the user must actually declare their buffer using this type or get a compiler warning. The alternative being void* where anything goes (no warnings, no type checks).
  • But because of that type check - having it be a union with uint32 increased its alignment requirement and thus avoided a compiler warning in the command handlers - i.e. when casting from CFE_SB_Msg_t* to something else - this increases the likelihood that this would be a downgrade (or equal) in alignment, not an upgrade - so no warning.

We could simplify a lot by just making the interface type in SB to be void* instead of CFE_SB_Msg_t* - but this would have no type checking or enforcement so its bad for safety - but it sure would make things simpler.

We need to resolve this ASAP for Caelum... realizing it's not a perfect solution, we need something good enough. I understand the lack of guarantee, but a fix that works for most systems is better than something that is "broken" on all of them. I do dislike CFE_SB_GetUserData, but even if we deprecated it I also think it's undesirable to have the payload start at different locations depending on the alignment requirements of the payload (makes ground system databases more complex).

Anyways, in an attempt to move forward I'll implement the alignment suggestion on top of #998 so we can at least get a PR in to discuss and see how the pattern settles.

My recommendation is to use the envelope/post office analogy as I mentioned in #998 (review)

  • CFE SB API deals with envelopes
  • CFE MSG API deals with contents

Upon receipt of a message we examine the MsgID and "open" the envelope. This can be a typecast to the CFE_MSG type (typically CFE_MSG_CommandHeader_t) or (preferably) by accessing the correctly-typed member within the CFE_SB_Msg_t envelope.

When sending a message we "wrap" the raw message in the envelope for sending.

The CFE SB API should only have abstractions for those basic items which are required for message transport and delivery such as its MsgId and size. These should operate on a CFE_SB_Msg_t* type. This would also be the right place to deal with any authentication data if we ever need to implement that for security reasons.

The CFE MSG API should have abstractions for all other metadata - timestamps, command codes, checksums, and any other relevant info.

The CFE SB API should probably also provide helpers (either macros or helper functions) to facilitate the proper "packaging" and "unpackaging" of these raw messages. Right now this is basically just a typecast that is performed directly in the apps (e.g. casting of CFE_SB_Msg_t* to its specific command type within the command dispatcher) but this could be more refined.