Switch to msgspec
Opened this issue · 5 comments
An idea I have had for a while, msgspec performs insanely better than anything else out there, and it would allow us to simplify models code by an insane magnitude, as well as being able to finally freeze models without a performance penalty, leading to the removal of a lot of code in cache to ensure immutability.
Unfortunately, this would not come easily, as it could potentially be a pretty big breaking change, as we would only be able to provide fields that Discord themselves provide (we will ofc need converter functions, similar to what EntityFactory
and EventFactory
do, but they would be a lot simpler and for specific cases, since the rest will be handled by msgspec). This might mean the removal of .app
and .shard
, possibly passing them to the end developer some other way.
Additionally, some nice thing we might be able to implement is automatic checking of our implementation vs the discord spec to detect missing fields and new fields that we need to implement.
Some work on the ports that I have been doing: https://github.com/davfsa/hikari/tree/feature/msgspec
Any type of comments or discussions are encouraged and welcome
Some previous work is needed before this is possible, mainly:
First off, I'm not a user of hikari I came across this as part of a discussion with someone else about more frameworks using type safe decoding, I have no stakes in this one way or another. Just providing a bit of extra info that might be relevant to any decision making process here, as it was not noted in the original post regarding breakage
msgspec is pretty tightly tied to CPython's internals, see: jcrist/msgspec#22
This isn't necessarily bad, and as noted in the issue, several other libraries to accelerate CPython code are also non-portable across implementations. Going with msgspec, unless you keep around a non-msgspec fallback path will break existing pypy or graalpy compatibility (All of your current requirements and hikari itself appear to be compatible with both). While that could be a reasonable choice, I'm unsure of your user-base's expectations here for compatibility with other Python implementations.
Thanks a lot for your input @mikeshardmind. It is insanely welcome!!
Didn't actually consider how using msgspec would limit providing PyPy interoperability. I have not really used PyPy on my daily basis, because I haven't really benchmarked it in real world needs (something on my todo list, just not sure how to do it properly). I have definitely tried to make hikari work with PyPy (pypy/pypy#3763) in case people do find it useful to use instead of CPython and don't have any plans to change in that front.
It is a shame that msgspec doesn't function nicely under PyPy (it doesn't even build the wheel 😞), I was looking forward to trying to provide typesafe objects (in case Discord decide to ever break the type of something in the future that wouldn't get caught by us) and, specially speedups when it comes to deserialization and memory usage (thanks to msgspec special object type).
Curious too, since you mentioned that you ran into this issue while discussing the topic with someone, are there any other alternatives to msgspec you considered (already thought myself of pydantic and similars, but they provide no real speedup vs attrs, only just add validation)?
Thanks again!
Curious too, since you mentioned that you ran into this issue while discussing the topic with someone, are there any other alternatives to msgspec you considered
I'm still evaluating options for my own uses where pypy compatibility is desired. Currently, I use msgspec a lot and only recently ran into a request for pypy support that made me aware of how pypy's normal compatibility layer can't handle it even at a performance cost
Right now, I'd pick mashumaro if speed was the primary factor, and the secondary factors included needed to only support one code path and support alternative python implementations.
I think the long-term ideal in this space for libraries is writing the thinnest abstraction over msgspec + either mashumaro or cattrs such that the fastest available option on each supported platform + implementation.
My initial thoughts are that it should be relatively straightforward with minimal duplication if the right tools are leveraged, something like the below comes to mind, but this is off the cuff and untested. There's also missing puzzle pieces here in the branching such as creating a shared encoder/decoder (both libraries encourage this for performance + options, but vary on options)
try:
import msgspec
except ImportError:
HAS_MSGSPEC = False
else:
HAS_MSGSPEC = True
if HAS_MSGSPEC:
from msgspec import Struct as Base
deco = lambda x: x
else:
from mashumaro import DataClassDictMixin as Base
from dataclasses import dataclass as deco
@deco
class Example(Base):
field_name: TypeClass
I do intend to explore this further in the future, but won't be doing so today, I'd be happy to follow up with detail after doing so
I was not aware of mashumaro, but thanks for bringing it into my radar. Will be having a look at it and playing around with it soon:tm:
I do intend to explore this further in the future, but won't be doing so today, I'd be happy to follow up with detail after doing so
Would greatly appreciate that! Thanks a lot for your time and for sharing your thoughts with me!
If you want, you can reach out to me on Discord (username is "davfsa"). Would be glad and happy to aid you with that compatibility shim you talked about, which we can both probably greatly leverage in our respective use cases :)
More work on this has been planned, with the initial change dropping PyPy support all together (aka, only using msgpec).
Down the line, we can consider a shim to utilize both msgspec and mashumaro. Depending on how complicated this faces, we might even be able to just add mashumaro into the mix too