Coding error trying to encode null during livingatlas clustering
Closed this issue · 1 comments
vjrj commented
I get this error during clustering, that I think is introduced by #987 in 2.18.0-SNAPSHOT+0~20231207231802.1143~1.gbp8802a4
:
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalStateException: Error encoding value: ValueInGlobalWindow{value=KV{5[179/10458$
0|-710|1996|7|28, au.org.ala.clustering.HashKeyOccurrence@318a655e}, pane=PaneInfo.NO_FIRING}
at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:73)
at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:104)
at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)
at au.org.ala.pipelines.beam.ClusteringPipeline.run(ClusteringPipeline.java:373)
at au.org.ala.pipelines.beam.ClusteringPipeline.main(ClusteringPipeline.java:81)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalStateException: Error encoding value: ValueInGlobalWindow{value=KV{5051752|38120|-710|1996|7|28, au.org.ala.clustering.HashKeyOccurrence@318a655e}, pane
=PaneInfo.NO_FIRING}
at org.apache.beam.runners.spark.coders.CoderHelpers.toByteArray(CoderHelpers.java:60)
at org.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions.lambda$groupByKeyAndWindow$c9b6f5c4$1(GroupNonMergingWindowsFunctions.java:87)
at org.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions.lambda$bringWindowToKey$0(GroupNonMergingWindowsFunctions.java:130)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators$6.transform(Iterators.java:785)
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:201)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null String
at org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:74)
at org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:68)
at org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:37)
at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:114)
at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:60)
at org.apache.beam.sdk.coders.RowCoderGenerator$EncodeInstruction.encodeDelegate(RowCoderGenerator.java:337)
at org.apache.beam.sdk.coders.Coder$ByteBuddy$5tgsQW7H.encode(Unknown Source)
at org.apache.beam.sdk.coders.Coder$ByteBuddy$5tgsQW7H.encode(Unknown Source)
at org.apache.beam.sdk.schemas.SchemaCoder.encode(SchemaCoder.java:124)
at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:73)
at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:37)
at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:591)
at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:582)
at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:542)
at org.apache.beam.runners.spark.coders.CoderHelpers.toByteArray(CoderHelpers.java:58)
Command:
sudo -u spark la-pipelines clustering all --cluster
cc @adam-collins .
adam-collins commented
Feel free to reopen if it is still an issue.