bigdatagenomics/deca

java.lang.UnsupportedOperationException: empty collection when running normalize_and_discover or CNV

Opened this issue · 0 comments

When running cnv or normalize_and_discover on Deca on a Databricks cluster, with the following params:

["--class","org.bdgenomics.deca.cli.DecaMain","--conf","spark.serializer=org.apache.spark.serializer.KryoSerializer","--conf","spark.kryo.registrator=org.bdgenomics.deca.serialization.DECAKryoRegistrator","--conf","spark.kryo.registrationRequired=true","--conf","spark.hadoop.fs.s3.impl=com.databricks.s3a.S3AFileSystem","--conf","spark.hadoop.fs.s3a.impl=com.databricks.s3a.S3AFileSystem","--conf","spark.hadoop.fs.s3n.impl=com.databricks.s3a.S3AFileSystem","--conf","spark.hadoop.fs.s3a.canned.acl=BucketOwnerFullControl","--conf","spark.hadoop.fs.s3a.acl.default=BucketOwnerFullControl","--conf","spark.hadoop.mapreduce.input.fileinputformat.split.minsize=536870912","s3a://data/jars/deca/deca-cli_2.11-0.2.1-SNAPSHOT.jar","normalize_and_discover","-I","s3a://data/test/bam-output/test_coverage.txt","-cnv_rate","0.0001","-max_sample_mean_RD","10","-max_sample_sd_RD","5","-max_target_length","10000","-max_target_mean_RD","10","-max_target_sd_RD_star","5","-mean_target_distance","50000","-mean_targets_cnv","100","-min_target_mean_RD","0","-save_zscores","s3a://data/test/bam-output/test.zscores","-o","s3a://data/test/bam-output/test.gff3","-multi_file"]

we get the following error

Command body threw exception: java.lang.UnsupportedOperationException: empty collection Exception in thread "main" java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$35.apply(RDD.scala:1053) at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$35.apply(RDD.scala:1053) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1053) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:379) at org.apache.spark.rdd.RDD.reduce(RDD.scala:1033) at org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix.numRows(IndexedRowMatrix.scala:66) at org.bdgenomics.deca.coverage.ReadDepthMatrix.<init>(ReadDepthMatrix.scala:29) at org.bdgenomics.deca.Deca$$anonfun$readXHMMMatrix$1.apply(Deca.scala:77) at org.bdgenomics.deca.Deca$$anonfun$readXHMMMatrix$1.apply(Deca.scala:38) at scala.Option.fold(Option.scala:158) at org.apache.spark.rdd.Timer.time(Timer.scala:48) at org.bdgenomics.deca.Deca$.readXHMMMatrix(Deca.scala:38) at org.bdgenomics.deca.cli.NormalizingDiscoverer.run(NormalizingDiscoverer.scala:73) at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55) at org.bdgenomics.deca.cli.NormalizingDiscoverer.run(NormalizingDiscoverer.scala:69) at org.bdgenomics.deca.cli.DecaMain$$anonfun$run$3.apply(DecaMain.scala:71) at org.bdgenomics.deca.cli.DecaMain$$anonfun$run$3.apply(DecaMain.scala:70) at scala.Option.fold(Option.scala:158) at org.bdgenomics.deca.cli.DecaMain.run(DecaMain.scala:70) at org.bdgenomics.deca.cli.DecaMain$.main(DecaMain.scala:26) at org.bdgenomics.deca.cli.DecaMain.main(DecaMain.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Running coverage by itself works fine, and using that output as input to normalize_and_discover gives the error. Please advise