ray-project/ray_beam_runner

Standardize how we fix serialization problems for Protobufs

Closed this issue · 6 comments

From review:

Nit / general remark: It's a bit unfortunate that we keep running into the serialization issue, and sometimes solve it by using a custom reduce, sometimes by registering a custom serializer (ray.util.register_serializer), and sometimes manually (SerializeToString / FromString).
It would be good if we could enable serialization for all the protobuf components in one central place - I'm not sure how that could be done though, as ray.util.register_serializer would have to be called on every ray worker that transmits protobuf objects. Maybe something to discuss?

Hi Pablo,

Can you point out some of the places of using different serialization/deserialization ?

Hi @iasoon ,

Just wanted to understand a bit more about the issue 🙂

It seems like there are some pros and cons with each serialization approach:

  • __reduce__: These definitions make classes a little more bloaty but allows us to customize how objects can be serialized/pickled. One downside is that for every Protobuf message we use, a wrapper class is probably needed to explicitly define how to serialize and deserialize. This has some advantages too, for some classes like RayRunnerExecutionContext we can skip serializing some of the instance attributes and lazily reconstruct the instance - the memory footprint would be smaller that way.
  • ray.util.register_serializer: According to the Ray docs, the API needs to be called on every Ray worker since the serializers are managed locally per Python process and it can be called idempotently. It seems like this is one of the more simpler approaches; we just simply call this at the start of Ray tasks, register serializers for every Protobuf message we use, and remove most of the __reduce__ code since they aren't needed anymore.

Curious for your opinion on keeping a mix of them or stick to one convention of using __reduce__ or ray.util.register_serializer for consistency?

@rkenmi Personally, I definitely prefer the register_serializer approach, as it is more universal - we don't always serialize protobuf objects as part of an owning object, but also just as regular task arguments.

A while back, I took a stab at implementing this for all protobuf types we depend on:
pabloem@3445540#diff-7507ff302fc9e22524c6fea6648354d4dc21c5076957b063efc296acf929f742
I'm not really sure what the best way would be to include this in our code. I guess the most straightforward way would be to indeed add a manual call to each task that needs to serialize protobuf messages. I imagine this could get tedious though.

There is also this effort that would help us, but I don't know where that went.
ray-project/ray#21383

I definitely agree that having custom __reduce__s is a good idea for some of the structs we have, but I think that's orthogonal to the serialization of the protobuf messages themselves. We could definitely do both.

Hope that helps! Happy to discuss more.

Nice, thanks for sharing the snippet!

I'm not really sure of a good way either for propagating register_serializer to all workers. The manual call seems okay with only a handful of Ray tasks/actors right now.

I did find this API which looks useful, but they recently marked it as deprecated due to reliability issues.

I'm not sure either. Too bad that that API was deprecated! Maybe we can ask the ray team whether there is a recommended way to do this?

I think technically we don't need it on any worker either, but only the ones that serialize protobuf messages. If my memory serves well, I think that's currently only the main task and the run_bundle tasks.