Exploring Google FlatBuffers

Hi! In this repository, we explore the serialization library, Google FlatBuffers.

Q: What is Google FlatBuffers?

FlatBuffers is a serialization library.

FlatBuffers - Overview https://google.github.io/flatbuffers/index.html#flatbuffers_overview

However, we might ask:

Q: What makes FlatBuffers better than other serialization library out there?

A: A couple of reasons, which I am rather impressed, but tad skeptical about :)

1. Fast access to serialized data, specifically by avoiding the need for deserialization during access.

It represents hierarchical data in a flat binary buffer, such that we can access the serialized data without deserializing the data.

2. Memory Efficiency and Speed

It only uses a fixed and small-sized flat binary buffer to access the serialized data in C++.

Also, it speeds up reading and writing, by allowing memory-mapping (mmap) or streaming.

Memory mapping allows us to access the serialized data directly from the file, and write directly to the file without having to copy the file data into memory and vice-versa.

This avoids us from having to unnecessarily spend more instructions (time) and memory to copy data from file to memory and vice-versa.

3. Allows for Optional fields.

Allows us to have a flexible schema. When adding a new field, we don't have to imput our existing data with the new field.

4. Small amount of generated code

Easy to integrate into existing codebase, and less generated code to worry about.

5. Strongly-typed, allows us to fail fast at compile time

Allows us to detect for errors at compile time, and not at runtime.

6. Cross Platform with no Dependencies

C++ Code works with any recent gcc / clang.

Question 1: Write a sample schema that can represent a property tree (property name, property value, property type, and sub-properties).

We defined a Parrot Schema in parrot.fbs

Source: Writing a schema

Question 2: Write code snippets in C++ that can read/update on the property object generated by the FlatBuffers compiler.

Step 1: Download the flatc command line compiler

  • choice "folder for installation"
  • cd "folder for installation"
  • git clone https://github.com/google/flatbuffers.git
  • cd flatbuffers
  • cmake -G "Unix Makefiles" (install cmake if need)
  • make
  • sudo ln -s /full-path-to-flatbuffer/flatbuffers/flatc /usr/local/bin/flatc
  • chmod +x /full-path-to-flatbuffer/flatbuffers/flatc
  • run in any place as "flatc"

Source: How to install flatc and flatbuffers on linux ubuntu

Step 2: Compile the schema into a CPP header file

cd back to the root of this directory.

Compile the schema into a CPP header file generated_parrot.h with the parrot.fbs flatbuffer schema, with features of c++17 standard.

flatc -c -o generated_parrot parrot.fbs --cpp-std c++17

Source: Using the schema compiler

Step 3: Add FlatBuffers as a static library to this CMake project.

We need to add FlatBuffers as a static library to the project.

  1. Create a lib folder at the root, and clone the FlatBuffers repository into it

Remove the .git folder in the cloned repository. Source control it as part of this project too.

Step 4: Write a sample code snippet in C++ that can read on the property object generated by the FlatBuffers compiler, generated_parrot.h.

Please see read_and_update_parrot.cpp :)

Step 5: Add the following instructions to our CMakeLists.txt

cmake_minimum_required(VERSION 3.10)

project(CryptoDotComAssignment)

set(CMAKE_CXX_STANDARD 17)

set(SOURCE_FILES qn_2_read_and_update_parrot.cpp)
set(HEADER_FILES qn_2_read_and_update_parrot.h)

# Include generated_parrot to be able to use the generated headers
include_directories(generated_parrot)

# Add the Flatbuffers directory to the CMake build
add_subdirectory(lib/flatbuffers)

# Add the Flatbuffers directory to the CMake build
add_executable(Question2 ${SOURCE_FILES} ${HEADER_FILES})

# Link the Flatbuffers static library only to the target
target_link_libraries(Question2 PRIVATE flatbuffers)

Step 6. Build the Project with CMake

// In the root, create a build folder
// This will store our files that are generated by CMake
mkdir build
cd build

// Configure the project and generate a native build system
cmake ..

// Call the build system to compile / link the project
cmake --build .

Step 7: Execute the executable Question2

➜  build git:(master) ✗ ./Question2    
Parrot name: Polly
Parrot name (directly accessed without deserialization): Polly
Updated Parrot name: UpdatedPolly

Question 3: Write code snippets in C++ that can send/receive the property object over TCP socket.

Step 1: We will be using the boost library to create a TCP socket.

Specifically, boost/asio.hpp

Download the boost library from https://www.boost.org/users/download/ if you do not have boost yet.

We will use at least 1.58.0

we will not source control boost as a library in the lib folder, as its too large.

Instead, we will setup the CMakeLists.txt file in abit, to find the boost libraries automatically in our system.

Step 2: Setup CMakeLists.txt

Specifically, to

  • find the boost library on the user's system, and allow it to be included in our code
  • separate the source files and header files for qn 2 and 3 (server + client)
  • set separate executables for qn 2 and 3 (server + client)

The CMakeLists.txt should look like this:

cmake_minimum_required(VERSION 3.10)

project(CryptoDotComAssignment)

set(CMAKE_CXX_STANDARD 17)

# New: separate the source files and header files for qn 2
set(QN_2_SOURCE_FILES qn_2_read_and_update_parrot.cpp)
set(QN_2_HEADER_FILES qn_2_read_and_update_parrot.h)

# New: separate the source files and header files for qn 3, server and client
set(QN_3_SERVER_SOURCE_FILES qn_3_server_socket.cpp)
set(QN_3_SERVER_HEADER_FILES qn_3_server_socket.h)

set(QN_3_CLIENT_SOURCE_FILES qn_3_client_socket.cpp)
set(QN_3_CLIENT_HEADER_FILES qn_3_client_socket.h)

# Include generated_parrot to be able to use the generated headers
include_directories(generated_parrot)

# Add the Flatbuffers directory to the CMake build
add_subdirectory(lib/flatbuffers)

# Force using static libraries - this embeds the boost code directly into the executable
set(Boost_USE_STATIC_LIBS ON)

# New: Add the boost directory to the CMake build
# Boost.Asio is a header-only library; we don't need to build it separately here, and only need to build the dependencies it needs
# Boost.System library is needed as a dependency for Boost.Asio
find_package(Boost 1.58.0 COMPONENTS system REQUIRED)

# New: Include the boost library, so we can include the boost headers in our code
include_directories(${Boost_INCLUDE_DIRS})

# new: Add the executable for Question 2
add_executable(Question2 ${QN_2_SOURCE_FILES} ${QN_2_HEADER_FILES})

# new: Add the executable for Question 3
add_executable(Question3Server ${QN_3_SERVER_SOURCE_FILES} ${QN_3_SERVER_HEADER_FILES})
add_executable(Question3Client ${QN_3_CLIENT_SOURCE_FILES} ${QN_3_CLIENT_HEADER_FILES})

# new: Link the flatbuffers static library only to the target
target_link_libraries(Question2 PRIVATE flatbuffers)

# new: Link the boost library and flatbuffers static libraries to qn 3 server and client targets
target_link_libraries(Question3Server PRIVATE flatbuffers ${Boost_LIBRARIES})
target_link_libraries(Question3Client PRIVATE flatbuffers ${Boost_LIBRARIES})

Step 3: Write a code snippet qn_3_server_socket.cpp, which will listen on a TCP socket, and receive the parrot object.

View the code here qn_3_server_socket.cpp

Step 4: Write a code snippet qn_3_client_socket.cpp, which will connect to the server TCP socket, create a parrot flatbuffer object (serialized), and send it over the socket to the server.

View the code here qn_3_client_socket.cpp

Step 5: Build the project

// change directory to build, so we compile the project in the build folder
cd build

// Configure the project and generate a native build system
cmake ..

// Call the build system to compile / link the project
// Now, we will have the two new executables, Question3Server, Question3Client
cmake --build .

Step 6: Run the server and client, each on two separate terminals

// Go to the build folder, where the executables are
cd build

Run the server. This will be responsible for listening on the socket, receiving the parrot object, and printing it

It should suspend at this point, since it is waiting for the client to send the parrot object over the socket

./Question3Server

It should terminate right away, since the client will send the parrot object over the socket and terminate.

// Run the client. This will be responsible for connecting to the server, creating a parrot object, and sending it over the socket
./Question3Client

We should get this on the server; The server should terminate after printing the parrot object

➜  build git:(master) ✗ ./Question3Server      
Name: Polly
Color: Red
➜  build git:(master) ✗ 

This on the client; The client should terminate after sending the parrot object to the server

➜  build git:(master) ✗ ./Question3Client
➜  build git:(master) ✗ 

Question 4: Write code snippets in C++ that use the reflection API to read from the TCP socket and iterate over the elements stored inside the property tree.

Q: What is the Google Flatbuffers Reflection API?

A: It is a feature that allows us to inspect and manipulate FlatBuffers data at runtime.

With the reflection API, we can access FlatBuffers data without knowing their schema at compile time.

This is useful when we want to work with different FlatBuffers schemas that are not known ahead of time,

or when we want to implement generic tools that can work with any FlatBuffers data.

Step 1: Generate a binary fbs with flatc

We will use the flatc compiler to generate the reflection schema.

The reflection schema is a FlatBuffers schema that describes the schema of another FlatBuffers schema.

The reflection schema is generated by the flatc compiler, and is used by the reflection API.

This command below will generate a file (parrot.bfbs) that needs to be read to fully reflect the types inside the .fbs file.

cd build
flatc -b --schema ../parrot.fbs

Step 2: Create a server socket which will listen on a TCP socket, and receive the parrot object

  • qn_4_server_socket.cpp

Step 3: Create a client socket which will send the parrot object to the server

  • qn_4_client_socket.cpp

**Step 4: Update CMakeLists.txt` to add them as executables

cmake_minimum_required(VERSION 3.10)

project(CryptoDotComAssignment)

set(CMAKE_CXX_STANDARD 17)

# separate the source files and header files for qn 2
set(QN_2_SOURCE_FILES qn_2_read_and_update_parrot.cpp)
set(QN_2_HEADER_FILES qn_2_read_and_update_parrot.h)

# separate the source files and header files for qn 3, server and client
set(QN_3_SERVER_SOURCE_FILES qn_3_server_socket.cpp)
set(QN_3_SERVER_HEADER_FILES qn_3_server_socket.h)

set(QN_3_CLIENT_SOURCE_FILES qn_3_client_socket.cpp)
set(QN_3_CLIENT_HEADER_FILES qn_3_client_socket.h)

set(QN_4_SERVER_SOURCE_FILES qn_4_client_socket.cpp)
set(QN_4_SERVER_HEADER_FILES qn_4_client_socket.h)

set(QN_4_CLIENT_SOURCE_FILES qn_4_client_socket.cpp)
set(QN_4_CLIENT_HEADER_FILES qn_4_client_socket.h)

# Include generated_parrot to be able to use the generated headers
include_directories(generated_parrot)

# Add the Flatbuffers directory to the CMake build
add_subdirectory(lib/flatbuffers)

# Force using static libraries - this embeds the boost code directly into the executable
set(Boost_USE_STATIC_LIBS ON)

#  Add the boost directory to the CMake build
# Boost.Asio is a header-only library; we don't need to build it separately here, and only need to build the dependencies it needs
# Boost.System library is needed as a dependency for Boost.Asio
find_package(Boost 1.58.0 COMPONENTS system REQUIRED)

# Include the boost library, so we can include the boost headers in our code
include_directories(${Boost_INCLUDE_DIRS})

# Add the executable for Question 2
add_executable(Question2 ${QN_2_SOURCE_FILES} ${QN_2_HEADER_FILES})

# Add the executable for Question 3
add_executable(Question3Server ${QN_3_SERVER_SOURCE_FILES} ${QN_3_SERVER_HEADER_FILES})
add_executable(Question3Client ${QN_3_CLIENT_SOURCE_FILES} ${QN_3_CLIENT_HEADER_FILES})

# Add the executable for Question 4
add_executable(Question4Server ${QN_4_SERVER_SOURCE_FILES} ${QN_4_SERVER_HEADER_FILES})
add_executable(Question4Client ${QN_4_CLIENT_SOURCE_FILES} ${QN_4_CLIENT_HEADER_FILES})

# Link the flatbuffers static library only to the target
target_link_libraries(Question2 PRIVATE flatbuffers)

# Link the boost library and flatbuffers static libraries to qn 3 server and client targets
target_link_libraries(Question3Server PRIVATE flatbuffers ${Boost_LIBRARIES})
target_link_libraries(Question3Client PRIVATE flatbuffers ${Boost_LIBRARIES})

# Link the boost library and flatbuffers static libraries to qn 4 server and client targets
target_link_libraries(Question4Server PRIVATE flatbuffers ${Boost_LIBRARIES})
target_link_libraries(Question4Client PRIVATE flatbuffers ${Boost_LIBRARIES})

Step 5: Build the project

// change directory to build, so we compile the project in the build folder
cd build

// Configure the project and generate a native build system
cmake ..

// Call the build system to compile / link the project
// Now, we will have the two new executables, Question4Server, Question4Client
cmake --build .

Step 6: Spin up the server and client, each on two separate terminals

// Go to the build folder, where the executables are
cd build

// Run the server. This will be responsible for listening on the socket, receiving the parrot object, and printing it
./Question4Server
// Go to the build folder, where the executables are
cd build

// Run the client. This will be responsible for connecting to the server, creating a parrot object, and sending it over the socket
./Question4Client

We should get

Client

➜  build git:(master) ✗ ./Question4Client
➜  build git:(master) ✗ 

Server

➜  build git:(master) ✗ ./Question4Server
Field name: color, type: 3
Byte: 0
Field name: name, type: 13
String value: Polly
Field name: position, type: 15
Field name: x, type: 11
Float: 1
Field name: y, type: 11
Float: 2
Field name: z, type: 11
Float: 3
Field name: talents, type: 14
Vector size: 2
Table at index 0:
Iterating fields for index 0:
Field name: level, type: 3
Byte: 0
Field name: name, type: 13
Null string value

// Yes, got a segfault :') 
// I couldn't figure out why yet, but the inferred type for the enum was erroneously assigned to be a byte
Table at index 1:
Iterating fields for index 1:
Field name: level, type: 3
[1]    83420 segmentation fault  ./Question4Server

Question 5: Consider the property tree will be updated frequently by the sender, think of a solution that synchronize the updates to the receiver.

To synchronize updates between the sender and the receiver, you can use a publish-subscribe pattern.

The sender is the publisher, and the receiver will be the subscriber.

The publisher will send updates to all subscribers whenever there is a change in the property tree.

Here's a high-level overview of the suggested approach:

  1. Implement a message protocol to handle different types of messages, such as subscribing, unsubscribing, and property tree updates.

  2. Modify the sender (publisher) to maintain a list of connected subscribers and send updates to all subscribers whenever the property tree is updated.

  3. Modify the receiver (subscriber) to send a subscription request to the publisher when it connects and handle incoming updates from the publisher.

Here's a basic example using Boost.Asio for networking and Flatbuffers for serialization:

Publisher (sender):

  1. Listen for incoming connections from subscribers.
  2. When a new subscriber connects, add it to the list of subscribers.
  3. When the property tree is updated, serialize the updated data using Flatbuffers and send it to all subscribers.

Subscriber (receiver):

  1. Connect to the publisher.
  2. Send a subscription request to the publisher.
  3. Continuously listen for incoming updates from the publisher and update the local property tree accordingly.

This approach ensures that the receiver's property tree is synchronized with the sender's property tree whenever there are updates.

The partial pseudo code is added into qn_5_publisher_sender.cpp and qn_5_susbcriber_receiver.cpp files.

Reflections

It was my first time using Flatbuffers, and loved how there are so many specific optimizations flatbuffers have.

For example, it was cool that flatbuffers was implemented such that we can access serialized fields directly without parsing / deserializing the entire buffer.

That said, i was a bit disappointed that the flatbuffers' examples on specifically reflection ain't too great.

I admittedly had to use ChatGPT and unofficial flatbuffers reflection documentation to figure things out.

Also, it was disappointing that the reflections API tends to infer the incorrect type; specifically the enum type as a byte.

(I would bet it was just me, and that the reflections API could have inferred that correctly)

Thank you so much for vetting my assignment, and hope I would get a chance to work with the team!