goodboy/tractor

Typed messaging and validation

goodboy opened this issue ยท 14 comments

I was originally going to make a big post on pydantic and how we could offer typed messages using that very very nice project despite there being a couple holdups for integration with msgpack.

However, it turns out just today an even faster and msgpack specific project was released: msgspec ๐Ÿ„๐Ÿผ

It claims to not only be faster then msgpack-python but also supports schema evolution and other niceties
It also has perf bumps when making multiple repeated encode/decode calls which is exactly how we're currently using msgpack inside our Channel.

Overall there looks to be no downside and we'll get typed message semantics fast and free ๐Ÿ‘๐Ÿผ

For reference, I'll leave a bunch of links I'd previously gathered regarding making pydantic work with msgpack:

TODO
  • support for a msgpack-python custom type serializer for pydantic.BaseModel such that we just implicitly render with .dict() as pack time and load via `Model(**message)`` at decode time?
  • write ourselves a small bytes-length prefixed framing protocol for msgspec as per the comments in #212
      while header := await stream.receive_all_or_none(4):
          len, = struct.unpack("<I", header)
          # probably want to sanity-check len for not being unreasonably huge
          chunk = await stream.receive_exactly(len)
          # do something with chunk
  • consider offering msgspec as an optional dependency if we end up liking it?

That's really neat! I was looking at implementing Pydantic in a project a little while ago, and chose not to. It seemed like the API wasn't quite what I was looking for. I was wanting data classes, and confidence that serialization and deserialization were both strict. I'm not quite sure why I concluded that, unfortunately. I knew about the data classes integration with Pydantic, but there was something missing with it that I felt I needed.

msgspec looks pretty cool for when you control the data format, but that definitely wasn't part of what I was doing. (I was writing an API wrapper over a JSON API).

I know many people have gotten a lot of mileage out of Pydantic. It's a great project.

Yeah alternatively we've been thinking about using capnproto and in particular seeing if we can auto-gen schema from type annotated Python functions.

I think this would be a huge boon since we'd get CBS (capability based sec) for free ๐Ÿ„๐Ÿผ.

The only holdup will be figuring out how pycapnp can work with async stuff and if it can help us with the schema gen/loading.
There appears to now be asyncio support but not sure how/if that will get in our way or if we can work off that impl to support trio.

Oh also another notable project (for a tractor dependent that will likely soon be broken out on it's own repo) there is
nptyping which may prove useful in automatic serialization of arrays.

Linking to jcrist/msgspec#25 since we'll likely need nested Structs to make this the most easy to implement (messages containing strictly typed payloads also defined as structs) otherwise there may need to be some finagling to either hack a standard message schema where payload's are decoded specifically as structs or we'll need to just always decode to a dict. It would be better to have the former considering the supposed speed improvement:

Depending on the schema, deserializing a message into a Struct can be roughly twice as fast as deserializing it into a dict.

gc-ss commented

in particular seeing if we can auto-gen schema from type annotated Python functions.

Is there an issue for this.

Essentially to do this, we need to:

  1. Parse dataclasses and save Field attributes
  2. Feed this into networkx to build graph with child, isa and 'hasa` relationships
  3. Use the builder pattern over the networkx graph with a dialect (capnproto or probuf etc)

@gc-ss not yet specifically; feel free to make one of course if you have some ideas and/or want to try it out.

Also, i think this could be easily wrapped in an external repo for use as well; it doesn't have to be tractor specific.

Feed this into networkx to build graph with child, isa and 'hasa` relationships

@gc-ss wait why would you need this?
Afaiu graph relations aren't relevant here; are you talking about building nested structs as trees or?

gc-ss commented

Afaiu graph relations aren't relevant here; are you talking about building nested structs as trees or?

Consider this:

class A:
    a: int


class B(A):
    b: int

class C(A):
    c: int

class D:
    composes_c: C

Now if we wanted auto-gen schema for type D, we don't want to spit out B. Also, it's possible some schema libraries might want schemas to be ordered in a certain way depending on the dependency tree.

So you need graphs

What do you think?

If this makes sense, I can move these into a different repo and send you a link.

@gc-ss yah, as I was thinking you mean for composed structs/types.

If this makes sense, I can move these into a different repo and send you a link.

Cool, yeah if you're interested in working on this then for sure.
We can also experiment here around the tractor IPC apis and see how it forms out with tinkering, then move it to a new project.

Up to you, I don't have immediate bandwidth for this.

First hold up with msgspec is mentioned in jcrist/msgspec#27, they have no streaming decoder api.

No longer a problem, we just have to write a prefix framing stream packer; see above.

Hmm alternatively to get typing going sooner then later we could just make some pydantic message type handlers. Pretty sure all we'd need it detection of a BaseModel and then serialization with .dict() on encode and decode into a BaseModel(**dict).

Pretty sure we could offer this as an extras dependency as well?

Linking explanation from jcrist/msgspec#25

Probably worth noting is dataclass union libs like https://github.com/yukinarit/pyserde

Hilarious to see a writeup of what we've been doing in this repo for years ๐Ÿ˜‚
https://kobzol.github.io/rust/python/2023/05/20/writing-python-like-its-rust.html#fnref:2

the part on ADTs is particularly notable as part of this feature work ๐Ÿ„๐Ÿผ