Design guidelines for information packets
samuell opened this issue · 4 comments
HI @trustmaster ,
I wanted to ask for your input regarding design of GoFlow components, and thought the discussion might be useful to others, so I added it here (or is there some better place?).
I was wondering what is the recommended strategy for designing information packets?
I know that FBP suggests to use "structured data packets", which I guess would correspond to some kind of structs with predefined fields etc.
At the same time, it seems good to support a bit of flexibility here. For example, for the blow-"library", it seems there is a lot of components that would suitably just use simple lines of text (or byte-slices/arrays), to maximize the performance, because of the prevalence of text-based formats in bioinformatics.
I suppose that I will need to use more structured IP-formats for other use cases though. That is where I was thinking whether it might be worthwhile to have some recommended general format of the IP:s as to increase interoperability between different component libraries.
As somebody pointed out though, the FBP pattern in itself allows to very easily create mappers between different IP formats, so this might be less of an issue in practice, but I wanted to at least raise it to discussion!
Hi @samuell ,
Thanks for starting such an important discussion.
First of all I have to say that Go is a strongly and statically typed language and it applies limitations on component and IP reuse. While in NoFlo you can easily create generic components and care less about the format in which data flows, in GoFlow sender and receiver port types have to match exactly and the connection must have a determined type. On the downside, it is not easy to create generic components with GoFlow (see below). On the upside, you're always sure what kind of data flows through your network and that it does so as fast as possible.
If your application operates on scalar values of built-in types, then it's reasonable to keep the IPs that simple: int, string, etc. It is easier and cheaper to switch from atomics to structs and from structs to generics later than coding everything in a generic way from the very beginning.
Regarding text-based formats: passing text strings is ok unless you need to parse them into something else in every node. If your components operate on parts of the text or apply some parsing on it every time, it is better to parse the text into a struct first and then pass the internal struct representation so you don't have to reparse it every time.
Struct packets are the most common ones, just because they match the IP model naturally. In a bigger app you might end up defining tens of packet types just for component communication, but that's the price of static typing.
Generic IP types and generic components is a bigger headache. Go doesn't have generics and channels are strongly typed, but there are several workarounds. The first one is compile-time: using code generation tool to create concrete components out of abstract ones. The second one is run-time: using interfaces, type assertions, interface{} and reflection. But this deserves a separate post I think.
@trustmaster
Ok, many thanks for the feedback!
I'm happy with that!
I would also be a bit concerned that imposing a more general data structure would impact negatively on performance for some applications, so I'm very happy to be able to use simple data types where that fits in, and just use mappers to other formats where needed.
I mostly just wanted to make sure I don't go against some GoFlow-best practices when doing it like that :)
Apparently Go channel performance doesn't depend much on the type of data being passed. You can play with this gist to try it yourself: https://gist.github.com/trustmaster/6251390
go test -test.bench .
As for GoFlow best practices, there aren't many yet and many are yet to be found. GoFlow is just a minimal library/runtime to write Go programs in a Flow-Based fashion. It doesn't follow FBP exactly and doesn't implement JPM's FBP in full (e.g. you won't find Substreams, Array ports, Automatic ports etc. in it). It doesn't come with a library of ready-to-use generic components (my previous post explains why). Instead it tends to add as little junk on top of slim Go code as possible and leaves most of the decisions to the programmer.
Apparently Go channel performance doesn't depend much on the type of data being passed. You can play with
this gist to try it yourself: https://gist.github.com/trustmaster/6251390
Thanks, will have a look!
As for GoFlow best practices, there aren't many yet and many are yet to be found.
GoFlow is just a minimal library/runtime to write Go programs in a Flow-Based fashion.
It doesn't follow FBP exactly and doesn't implement JPM's FBP in full (e.g. you won't
find Substreams, Array ports, Automatic ports etc. in it). It doesn't come with a library
of ready-to-use generic components (my previous post explains why). Instead it tends
to add as little junk on top of slim Go code as possible and leaves most of the decisions
to the programmer.
Absolutely, understood!
No problem. I like this very fact, since it provides (to me) an even lower barrier to start playing with the concepts, while keeping things as simple as possible, as long as possible (and if more layers are needed, they can easily be built on top of that).
As hinted in the (very brief) slides from my lightning talk at a Go meetup recently [1], I came into FBP quite much from the angle of trying to make my code more "pipelineable", and as can be seen from those code examples, there isn't really much of a step to go from chaining of "generator functions" into implementing the same as a network in GoFlow, except that I now have orders of magnitudes more flexibility to create custom network structure, and start to do interesting stuff (I.e. moving from the 1D world to 2D or 3D). This is really intriguing to me, and I'm just thinking about what to do with this new power :)