gbif/pipelines

Coding error trying to encode null during livingatlas clustering

Closed this issue · 1 comments

vjrj commented

I get this error during clustering, that I think is introduced by #987 in 2.18.0-SNAPSHOT+0~20231207231802.1143~1.gbp8802a4:

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalStateException: Error encoding value: ValueInGlobalWindow{value=KV{5[179/10458$
0|-710|1996|7|28, au.org.ala.clustering.HashKeyOccurrence@318a655e}, pane=PaneInfo.NO_FIRING}                                                                                       
        at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:73)                                                                         
        at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:104)                                                                          
        at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)                                                                           
        at au.org.ala.pipelines.beam.ClusteringPipeline.run(ClusteringPipeline.java:373)                                                                                            
        at au.org.ala.pipelines.beam.ClusteringPipeline.main(ClusteringPipeline.java:81)                                                                                            
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                              
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)                                                                                            
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                                                    
        at java.lang.reflect.Method.invoke(Method.java:498)                                                                                                                         
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)                                                                                             
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)                                                                  
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)                                                                                                   
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)                                                                                                        
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)                                                                                                       
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)                                                                                              
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)                                                                                                         
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)                                                                                                              
Caused by: java.lang.IllegalStateException: Error encoding value: ValueInGlobalWindow{value=KV{5051752|38120|-710|1996|7|28, au.org.ala.clustering.HashKeyOccurrence@318a655e}, pane
=PaneInfo.NO_FIRING}                                                                                                                                                                
        at org.apache.beam.runners.spark.coders.CoderHelpers.toByteArray(CoderHelpers.java:60)                                                                                      
        at org.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions.lambda$groupByKeyAndWindow$c9b6f5c4$1(GroupNonMergingWindowsFunctions.java:87)                 
        at org.apache.beam.runners.spark.translation.GroupNonMergingWindowsFunctions.lambda$bringWindowToKey$0(GroupNonMergingWindowsFunctions.java:130)                            
        at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators$6.transform(Iterators.java:785)                                                               
        at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)                                                   
        at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)                                                                                               
        at scala.collection.Iterator$$anon$12.next(Iterator.scala:445)                                                                                                              
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:201)                                                                                      
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)                                                                                        
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)                                                                                               
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)                                                                                               
        at org.apache.spark.scheduler.Task.run(Task.scala:123)                                                                                                                      
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)                                                                                      
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)                                                                                                        
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)                                                                                                    
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)                                                                                          
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                                                                                          
        at java.lang.Thread.run(Thread.java:750)                                                                                                                                    
Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null String                                                                                                   
        at org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:74)                                                                                               
        at org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:68)                                                                                               
        at org.apache.beam.sdk.coders.StringUtf8Coder.encode(StringUtf8Coder.java:37)                                                                                               
        at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:114)                                                                                          
        at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:60)                                                                                           
        at org.apache.beam.sdk.coders.RowCoderGenerator$EncodeInstruction.encodeDelegate(RowCoderGenerator.java:337)                                                                
        at org.apache.beam.sdk.coders.Coder$ByteBuddy$5tgsQW7H.encode(Unknown Source)                                                                                               
        at org.apache.beam.sdk.coders.Coder$ByteBuddy$5tgsQW7H.encode(Unknown Source)                                                                                               
        at org.apache.beam.sdk.schemas.SchemaCoder.encode(SchemaCoder.java:124)                                                                                                     
        at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)                                                                                                                  
        at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:73)                                                                                                               
        at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:37)                                                                                                               
        at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:591)                                                                             
        at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:582)                                                                             
        at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:542)                                                                             
        at org.apache.beam.runners.spark.coders.CoderHelpers.toByteArray(CoderHelpers.java:58)                                                                                      

Command:

sudo -u spark la-pipelines clustering all --cluster

cc @adam-collins .

Feel free to reopen if it is still an issue.