Union type field is not read properly between flatc -> flatcc
bzyx opened this issue · 9 comments
I have prepared the following flatbuffers schema to show the case
namespace TestMSG;
table MsgPing {
text: string;
length: ushort;
}
table MsgPong {
text: string;
length: ushort;
}
union MessageUnion { MsgPing, MsgPong}
table ComboMessage {
message: MessageUnion;
}
root_type ComboMessage;
The ComboMessage message is a container for any type of message defined by the union.
For test purposes I've used flatc to generate Python wrapper, created a message and wrote it to a file:
import flatbuffers
from TestMSG import ComboMessage
from TestMSG import MessageUnion
from TestMSG import MsgPing
from TestMSG import MsgPong
TEXT = "Hello Python!"
# Generate the message
builder = flatbuffers.Builder(1024)
textToUse = builder.CreateString(TEXT)
MsgPing.Start(builder)
MsgPing.AddText(builder, textToUse)
MsgPing.MsgPingAddLength(builder, len(TEXT))
ping = MsgPing.End(builder)
ComboMessage.Start(builder)
ComboMessage.AddMessage(builder, ping)
ComboMessage.AddMessageType(builder, MessageUnion.MessageUnion().MsgPing)
combo_message = ComboMessage.End(builder)
builder.Finish(combo_message)
buf = builder.Output()
with open("flatbuffer_msg.bin", "wb") as f:
f.write(buf)
print(f"{buf!r}")
Then using flatcc I've prepared reader sample:
#include <stdio.h>
#include "testmsg_test_reader.h"
int main(void) {
char buffer[1024];
FILE *in_file = fopen("flatbuffer_msg.bin", "rb");
size_t size = fread(buffer, sizeof(char), 1024, in_file);
printf("Read %ld\n", size);
TestMSG_ComboMessage_table_t msg = TestMSG_ComboMessage_as_root(buffer);
TestMSG_MessageUnion_union_type_t type = TestMSG_ComboMessage_message_type_get(msg);
printf("Message type is: %s [%d]\n", TestMSG_MessageUnion_type_name(type), type);
fclose(in_file);
return 0;
}
I'd expect to get the Message type to be shown as "MsgPing", instead the output of the application is:
Read 60
Message type is: NONE [0]
Verification snipped using Python shows the correct type:
with open("flatbuffer_msg.bin", "rb") as f_read:
read_buf = f_read.read()
read_message = ComboMessage.ComboMessage.GetRootAs(read_buf, 0)
print("Read message type: ", read_message.MessageType())
Read message type: 1
Could you please try dump the hex content of the binary file using the new feature
https://github.com/google/flatbuffers/tree/master/tests/annotated_binary
It requires a recent flatc build, I don't think it has been released yet.
Sure thing I've build both flatc and flatcc from the latest sources:
// Annotated Flatbuffer Binary
//
// Schema file: testmsg_test.fbs
// Binary file: flatbuffer_msg.bin
header:
+0x00 | 0C 00 00 00 | UOffset32 | 0x0000000C (12) Loc: +0x0C | offset to root table `TestMSG.ComboMessage`
vtable (TestMSG.ComboMessage):
+0x04 | 08 00 | uint16_t | 0x0008 (8) | size of this vtable
+0x06 | 0C 00 | uint16_t | 0x000C (12) | size of referring table
+0x08 | 07 00 | VOffset16 | 0x0007 (7) | offset to field `message_type` (id: 0)
+0x0A | 08 00 | VOffset16 | 0x0008 (8) | offset to field `message` (id: 1)
root_table (TestMSG.ComboMessage):
+0x0C | 08 00 00 00 | SOffset32 | 0x00000008 (8) Loc: +0x04 | offset to vtable
+0x10 | 00 00 00 | uint8_t[3] | ... | padding
+0x13 | 01 | UType8 | 0x01 (1) | table field `message_type` (UType)
+0x14 | 0C 00 00 00 | UOffset32 | 0x0000000C (12) Loc: +0x20 | offset to field `message` (union of type `MsgPing`)
vtable (TestMSG.MsgPing):
+0x18 | 08 00 | uint16_t | 0x0008 (8) | size of this vtable
+0x1A | 0C 00 | uint16_t | 0x000C (12) | size of referring table
+0x1C | 08 00 | VOffset16 | 0x0008 (8) | offset to field `text` (id: 0)
+0x1E | 06 00 | VOffset16 | 0x0006 (6) | offset to field `length` (id: 1)
table (TestMSG.MsgPing):
+0x20 | 08 00 00 00 | SOffset32 | 0x00000008 (8) Loc: +0x18 | offset to vtable
+0x24 | 00 00 | uint8_t[2] | .. | padding
+0x26 | 0D 00 | uint16_t | 0x000D (13) | table field `length` (UShort)
+0x28 | 04 00 00 00 | UOffset32 | 0x00000004 (4) Loc: +0x2C | offset to field `text` (string)
string (TestMSG.MsgPing.text):
+0x2C | 0D 00 00 00 | uint32_t | 0x0000000D (13) | length of string
+0x30 | 48 65 6C 6C 6F 20 50 79 | char[13] | Hello Py | string literal
+0x38 | 74 68 6F 6E 21 | | thon!
+0x3D | 00 | char | 0x00 (0) | string terminator
padding:
+0x3E | 00 00 | uint8_t[2] | .. | padding
Here is the binary file itself:
hexdump --canonical flatbuffer_msg.bin
00000000 0c 00 00 00 08 00 0c 00 07 00 08 00 08 00 00 00 |................|
00000010 00 00 00 01 0c 00 00 00 08 00 0c 00 08 00 06 00 |................|
00000020 08 00 00 00 00 00 0d 00 04 00 00 00 0d 00 00 00 |................|
00000030 48 65 6c 6c 6f 20 50 79 74 68 6f 6e 21 00 00 00 |Hello Python!...|
00000040
From a quick inspection, that buffer looks OK, and the type should be 1 and not 0, so it appears that something is wrong with flatcc or the way that it is being used.
Your char buffer is not aligned correctly - a common mistake - but that is unlikely to be the problem if you are using an x86 based system - it will however affect the flatcc verifier if you choose to call it.
I am involved in projects where flatcc is used extensively with unions that interoperate with Go and Javascript, so in general unions should be working.
I will need to have a closer look. I suspect it might specifically be the direct accessor to the union type that is somehow wrong: TestMSG_ComboMessage_message_type_get
You could try to get the union as a struct with two fields: a type and an offset and see what that returns - as per documentation.
And just to rule it out: sometimes people test something different from what they think that they are testing, so please double check it is the correct buffer that you are reading.
Note that there are 0x40 bytes = 64 dec. bytes in the hexdump.
In C you are only reading 60 bytes. Truncation is not necessarily a cause of the problem at hand, but it might suggest that you are reading another buffer.
When I compile your program with the buffer you provided, I get the expected result:
Read 64
Message type is: MsgPing [1]
You are right, in both cases. First I messed up with the files and after double check, I did observe that the result is okay, but the problem I have is XY-kind problem and I tried to reproduce it first on the X86 platform. But it didn't work for me when I used Zephyr based ARM embeeeded system.
EDIT:
After making sure that I have the same fbs file and exactly the same code on each platform it started working for me.
You did mention alignment issues, what should I align and how? I want to make sure that the code will always work.
I wrote the below before your edit, but it still applies:
I am not familiar with ARM Zephyr. Recent ARM appear to not mind unaligned access much. Some relatively new ones take a performance hit, but the latest not so much - I have no direct experience with this. Other older or embedded / low-end platforms might object more aggressively, but then you would expect a segfault.
Another issue is that older ARMs might be big endian. While this should not be a problem for flatcc, and the binary buffer is the same in all cases, it might trigger some code paths that are not regularly testet.
However, flatcc v0.4 has been extensively test with big endian.
As to your new question on alignment:
Alignment is based on the largest datatype stored in the buffer. This is minimum 4 bytes because the offset to the root table is 4 bytes. If you have an int64 type or a double float, it will be 8 bytes. If you explictly use a FlatBuffer attribute to align structs to more than that, you will need to align to that.
Since not all buffers of the same schema necessarily use the largest data type, flatcc builder can give you information on the required alignment for the actual buffer. But generally, you should know in advance based on the schema.
In your test program you can use alignas(4) char buf[1024];
(from memory). If you use malloc, it depends on the platform, but most implementations will align to at least 8 bytes.
Even if your platform does not support alignas, it will be available with flatcc due the portable library. It is needed to ensure structs are aligned correctly in generated code.
Oh, and there is a portable aligned_alloc and aligned_free with flatcc that you can use if you need more than what malloc provides. See test programs in flatcc, they use these. aligned_alloc
is C11 std., but aligned_free
is flatcc specific and defaults to free
on C11, but other platforms need specialized free operations for aligned allocation, so flatcc needs aligned_free
to accommodate.
So I wasn't clear in my last answer.
I told you how to find out how much to align, but not what to align.
You must align the memory buffer where you read the buffer from.
Some CPU's have an address system where an address must be aligned to 4 bytes if they try to read a word of 4 bytes an so forth. More recent CPU's and Intel for a long time, can read from other addresses, but internally it typically means the CPU makes a special case and reads two words, then shifts the result. This is slower on some hardware, but barely noticable on modern high end hardware. However, you still have a similar problem when reading across cache lines - which will work, but it will be slower. This is not directly relevant to the discussion, but this is one reason why FlatBuffers makes it possible to align to more than 8 bytes along with hardware mapping of I/O & graphics.