Refactor events
manuelstein opened this issue · 2 comments
We currently use our own internal message format and we're passing events through a local message bus (Java/Thrift queue implementation). CloudEvents provides a common event format and SDKs that support various transport bindings.
The suggestion is to refactor our eventing layer to use CloudEvents and its transport SDKs to pass messages between workers.
Discussion topics:
- encoding/decoding speed of CloudEvents SDK vs. our custom json parsing (in Java & Python)
- which choice of encoding between function workers? (python currently only supports json encoding)
- which transport between function workers
- integrate with Knative Eventing?
- mapping to CloudEvents (typing of workflow-internal and external events)
Workers are currently written in the function's language, so we'd need respective SDK support for whatever choice is made on encoding and transport. The respective SDK's performance on serializing/deserializing CloudEvents affects the workflow performance.
- On the bright side, typed events allow interoperability with other platforms, e.g. users can easier develop functions that consume or produce known event types.
- Using CloudEvents means the transport and binding can be adapted to use other messaging subsystems where applicable
Thanks for writing this up! Good summary of discussion topics.
Just a small correction: The function workers are actually written in one language (python), which listen to the local message bus, deserialize the messages, initialize the necessary data structure(s) and invoke the user code. Afterwards, it is again the python code that does the serialization of the output, do other chores (e.g., backups) and publish it to the message bus. The Java functions are handled by having a dedicated worker, but that dedicated worker only accepts the input to the function (i.e., no metadata) and sets up the corresponding API object (that would communicate with its corresponding part in python for API calls).
Of course, the above can be changed if we plan to undertake having different workers with different languages, but that comes with its own problems of maintaining the workers and keeping them functionality-wise equal. Having a single worker that handles the main logic (and language-specific helpers) makes that maintenance easier.
In that sense, at this point, the python SDK for CloudEvents and its performance would be more critical than the other languages.
IMHO it's much better this way (messaging, state logic, progress log, backups handled by one implementation in a single language only).