Streaming dataflow fails with org.xerial.snappy.SnappyIOException [v2.1.0]
ankurcha opened this issue · 6 comments
While running the streaming dataflow pipeline against messages from pubsub. I encountered this error:
java.lang.RuntimeException: java.lang.IllegalArgumentException: unable to deserialize WindowFn
at com.google.cloud.dataflow.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:288)
at com.google.cloud.dataflow.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:258)
at com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:55)
at com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:43)
at com.google.cloud.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:78)
at com.google.cloud.dataflow.worker.MapTaskExecutorFactory.create(MapTaskExecutorFactory.java:145)
at com.google.cloud.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:932)
at com.google.cloud.dataflow.worker.StreamingDataflowWorker.access$800(StreamingDataflowWorker.java:134)
at com.google.cloud.dataflow.worker.StreamingDataflowWorker$7.run(StreamingDataflowWorker.java:778)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: unable to deserialize WindowFn
at org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:75)
at org.apache.beam.runners.core.construction.WindowingStrategyTranslation.windowFnFromProto(WindowingStrategyTranslation.java:380)
at org.apache.beam.runners.core.construction.WindowingStrategyTranslation.fromProto(WindowingStrategyTranslation.java:336)
at com.google.cloud.dataflow.worker.GroupAlsoByWindowParDoFnFactory.deserializeWindowingStrategy(GroupAlsoByWindowParDoFnFactory.java:239)
at com.google.cloud.dataflow.worker.AssignWindowsParDoFnFactory.create(AssignWindowsParDoFnFactory.java:59)
at com.google.cloud.dataflow.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:66)
at com.google.cloud.dataflow.worker.MapTaskExecutorFactory.createParDoOperation(MapTaskExecutorFactory.java:365)
at com.google.cloud.dataflow.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:276)
... 11 more
Caused by: org.xerial.snappy.SnappyIOException: [EMPTY_INPUT] Cannot decompress empty stream
at org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:94)
at org.xerial.snappy.SnappyInputStream.<init>(SnappyInputStream.java:59)
at org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:70)
... 18 more
It seems like the pipeline does not do any work as it's failing to start.
Job Id: "2017-09-08_15_26_06-8381780606637465333"
We're also seeing this, but related to a BigQuery write. All our Dataflow jobs are failing due to this
We are experiencing the same issues while running batch jobs reading from BigQuery.
These issues appeared first at around 2017-09-08 23:00 UTC
. We made no changes to the pipeline.
Version: Apache Beam SDK for Java 2.1.0
Region: us-central1-f...
(Output truncated)
We've been rerunning the pipelines a few times and it always crashes just as the worker machines are ready to start. None of the functions report any elements in the input collection nor the output.
We are working on a rollback of a faulty 2.1.0 worker release. Workaround meanwhile is to go back to 2.0.0.
Thanks for the quick turnaround!
We believe the issue has been fixed. Please try again.
Thanks! confirmed that exceptions are gone and messages are being processed in new pipelines.