Union types break service serialization
chris05atm opened this issue · 0 comments
What happened?
When we upgraded our conjure-python dependency we ran into runtime pyspark serialization issues. We previously could serialize a service object but post-conjure-python upgrade this same service was no longer serializable.
We suspect #320 or #221 broke serde behavior for us.
The pyspark error was:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/worker.py", line 413, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/worker.py", line 68, in read_command
command = serializer._read_with_length(file)
File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
return self.loads(obj)
File "/opt/palantir/services/.4229/var/tmp/asset-install/85af169544daf00da129a002813aba21/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 697, in loads
return pickle.loads(obj, encoding=encoding)
AttributeError: type object 'AlertFailureResponse' has no attribute '_service_exception'
This was thrown when passing our service through a map
function. This occurred even with zero data passed along. It was only the service code that previously worked.
Other conjure definitions:
AlertResponse:
union:
failureResponse: AlertFailureResponse
successResponse: AlertSuccessResponse
AlertFailureResponse:
fields:
serviceException: ServiceException
AlertSuccessResponse:
fields:
uuid: uuid
Our __conjure_generator_version__
is 3.12.1
.
We mitigated the issue by building our Conjure service in a mapPartitions
function which is likely a better practice anyway.
What did you want to happen?
We are not entirely sure on why these new type definitions are not serializable. I believe the fields are renamed in a way that pyspark's serialization cannot find but that is conjecture at this point.