spotify/dbeam

Failed to run on Google cloud dataflow when built with Java 10

hilliao opened this issue · 4 comments

I can't figure out how to run DBeam in Google cloud dataflow. I digged SCIO's doc and tried to run DBeam with Google cloud dataflow's runner in sbt shell. I had to add the following line to build.sbt:
"org.apache.beam" % "beam-runners-google-cloud-dataflow-java" % beamVersion,
under libraryDependencies ++= Seq( but still got the errors:

hil@macbook13i72017 ~/c/dbeam> sbt
[info] Loading settings from idea.sbt ...
[info] Loading global plugins from /Users/hil/.sbt/1.0/plugins
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from /Users/hil/cbsi/dbeam/project
[info] Loading settings from version.sbt,build.sbt ...
[info] Set current project to dbeam-foss-parent (in build file:/Users/hil/cbsi/dbeam/)
[info] sbt server started at local:///Users/hil/.sbt/1.0/server/b6db3491d7efae758331/sock
sbt:dbeam-foss-parent> project dbeamCore
[info] Set current project to dbeam-core (in build file:/Users/hil/cbsi/dbeam/)
sbt:dbeam-core> runMain com.spotify.dbeam.JdbcAvroJob --project=i-ingest-poc --zone=us-west1-c --runner=DataflowRunner --connectionUrl=jdbc:mysql://localhost:3306/dbeamtest --table=pet --username=hil --password=password --output=gs://dbeam-test/tmp
[warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list
[info] Running (fork) com.spotify.dbeam.JdbcAvroJob --project=i-ingest-poc --zone=us-west1-c --runner=DataflowRunner --connectionUrl=jdbc:mysql://localhost:3306/dbeamtest --table=pet --username=hil --password=password --output=gs://dbeam-test/tmp
[error] Wed May 23 17:21:17 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[error] [main] INFO JdbcAvroConversions - Creating Avro schema based on the first read row from the database
[error] [main] INFO JdbcAvroConversions - Schema created successfully. Generated schema: {"type":"record","name":"pet","namespace":"dbeam_generated","doc":"Generate schema from JDBC ResultSet from 'pet' or the --sqlFile with jdbc:mysql://localhost:3306/dbeamtest","fields":[{"name":"name","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"name"},{"name":"owner","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"owner"},{"name":"species","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"species"},{"name":"sex","type":["null","string"],"doc":"From sqlType 1 CHAR","default":null,"typeName":"CHAR","sqlCode":"1","columnName":"sex"},{"name":"birth","type":["null","long"],"doc":"From sqlType 91 DATE","default":null,"typeName":"DATE","sqlCode":"91","columnName":"birth"},{"name":"death","type":["null","long"],"doc":"From sqlType 91 DATE","default":null,"typeName":"DATE","sqlCode":"91","columnName":"death"}],"connectionUrl":"jdbc:mysql://localhost:3306/dbeamtest","tableName":"pet"}
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - Elapsed time to schema 0.585 seconds
[error] Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Current ClassLoader is 'jdk.internal.loader.ClassLoaders$AppClassLoader@4b9af9a9' only URLClassLoaders are supported
[error] at scala.Predef$.require(Predef.scala:277)
[error] at com.spotify.scio.runners.dataflow.DataflowContext$.detectClassPathResourcesToStage(DataflowContext.scala:58)
[error] at com.spotify.scio.runners.dataflow.DataflowContext$.getFilesToStage(DataflowContext.scala:49)
[error] at com.spotify.scio.runners.dataflow.DataflowContext$.prepareOptions(DataflowContext.scala:39)
[error] at com.spotify.scio.RunnerContext$.prepareOptions(ScioContext.scala:104)
[error] at com.spotify.scio.ScioContext.pipeline(ScioContext.scala:287)
[error] at com.spotify.scio.ScioContext$$anonfun$parallelize$1.apply(ScioContext.scala:857)
[error] at com.spotify.scio.ScioContext$$anonfun$parallelize$1.apply(ScioContext.scala:856)
[error] at com.spotify.scio.ScioContext.requireNotClosed(ScioContext.scala:419)
[error] at com.spotify.scio.ScioContext.parallelize(ScioContext.scala:856)
[error] at com.spotify.dbeam.JdbcAvroJob$.createSchema(JdbcAvroJob.scala:63)
[error] at com.spotify.dbeam.JdbcAvroJob$.prepareExport(JdbcAvroJob.scala:131)
[error] at com.spotify.dbeam.JdbcAvroJob$.runExport(JdbcAvroJob.scala:151)
[error] at com.spotify.dbeam.JdbcAvroJob$.main(JdbcAvroJob.scala:160)
[error] at com.spotify.dbeam.JdbcAvroJob.main(JdbcAvroJob.scala)
[error] java.lang.RuntimeException: Nonzero exit code returned from runner: 1
[error] at sbt.ForkRun.processExitCode$1(Run.scala:33)
[error] at sbt.ForkRun.run(Run.scala:42)
[error] at sbt.Defaults$.$anonfun$bgRunMainTask$6(Defaults.scala:1147)
[error] at sbt.Defaults$.$anonfun$bgRunMainTask$6$adapted(Defaults.scala:1142)
[error] at sbt.internal.BackgroundThreadPool.$anonfun$run$1(DefaultBackgroundJobService.scala:366)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
[error] at scala.util.Try$.apply(Try.scala:209)
[error] at sbt.internal.BackgroundThreadPool$BackgroundRunnable.run(DefaultBackgroundJobService.scala:289)
[error] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
[error] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[error] at java.base/java.lang.Thread.run(Thread.java:844)
[error] (Compile / runMain) Nonzero exit code returned from runner: 1
[error] Total time: 7 s, completed May 23, 2018, 5:21:18 PM

The error was Current ClassLoader is 'jdk.internal.loader.ClassLoaders$AppClassLoader@4b9af9a9' only URLClassLoaders are supported

Changing --runner to DirectRunner succeeded:

sbt:dbeam-core> runMain com.spotify.dbeam.JdbcAvroJob --project=i-ingest-poc --zone=us-west1-c --runner=DirectRunner --connectionUrl=jdbc:mysql://localhost:3306/dbeamtest --table=pet --username=hil --password=password --output=gs://dbeam-test/tmp
[warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list
[info] Running (fork) com.spotify.dbeam.JdbcAvroJob --project=i-ingest-poc --zone=us-west1-c --runner=DirectRunner --connectionUrl=jdbc:mysql://localhost:3306/dbeamtest --table=pet --username=hil --password=password --output=gs://dbeam-test/tmp
[error] Wed May 23 17:28:20 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[error] [main] INFO JdbcAvroConversions - Creating Avro schema based on the first read row from the database
[error] [main] INFO JdbcAvroConversions - Schema created successfully. Generated schema: {"type":"record","name":"pet","namespace":"dbeam_generated","doc":"Generate schema from JDBC ResultSet from 'pet' or the --sqlFile with jdbc:mysql://localhost:3306/dbeamtest","fields":[{"name":"name","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"name"},{"name":"owner","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"owner"},{"name":"species","type":["null","string"],"doc":"From sqlType 12 VARCHAR","default":null,"typeName":"VARCHAR","sqlCode":"12","columnName":"species"},{"name":"sex","type":["null","string"],"doc":"From sqlType 1 CHAR","default":null,"typeName":"CHAR","sqlCode":"1","columnName":"sex"},{"name":"birth","type":["null","long"],"doc":"From sqlType 91 DATE","default":null,"typeName":"DATE","sqlCode":"91","columnName":"birth"},{"name":"death","type":["null","long"],"doc":"From sqlType 91 DATE","default":null,"typeName":"DATE","sqlCode":"91","columnName":"death"}],"connectionUrl":"jdbc:mysql://localhost:3306/dbeamtest","tableName":"pet"}
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - Elapsed time to schema 0.726 seconds
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - Running queries: List(SELECT * FROM pet)
[error] WARNING: An illegal reflective access operation has occurred
[error] WARNING: Illegal reflective access by org.apache.beam.runners.direct.repackaged.com.google.protobuf.UnsafeUtil (file:/private/var/folders/q3/rf49by096192ckdl4j7dt56w0000gn/T/sbt_d9330199/target/b187293c/beam-runners-direct-java-2.4.0.jar) to field java.nio.Buffer.address
[error] WARNING: Please consider reporting this to the maintainers of org.apache.beam.runners.direct.repackaged.com.google.protobuf.UnsafeUtil
[error] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
[error] WARNING: All illegal access operations will be denied in a future release
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.WriteFiles - Opening writer 191b3dde-2394-4c6b-a022-5bab0f73dd00 for window org.apache.beam.sdk.transforms.windowing.GlobalWindow@fe7b6b0 pane PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0} destination null
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Preparing write...
[error] Wed May 23 17:28:23 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[error] Wed May 23 17:28:23 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Write prepared
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Starting write...
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Executing query (this can take a few minutes) ...
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Execute query took 0.01 seconds
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Read 1 rows, took 0.01 seconds
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Closing connection, flushing writer...
[error] [direct-runner-worker] INFO com.spotify.dbeam.JdbcAvroIO$JdbcAvroWriter - jdbcavroio : Write finished
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.FileBasedSink$Writer - Successfully wrote temporary file gs://dbeam-test/tmp/.temp-beam-2018-05-24_00-28-22-1/191b3dde-2394-4c6b-a022-5bab0f73dd00
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.WriteFiles - Finalizing 1 file results
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.FileBasedSink - Finalizing for destination null num shards 1.
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.FileBasedSink - Will copy temporary file FileResult{tempFilename=gs://dbeam-test/tmp/.temp-beam-2018-05-24_00-28-22-1/191b3dde-2394-4c6b-a022-5bab0f73dd00, shard=0, window=org.apache.beam.sdk.transforms.windowing.GlobalWindow@fe7b6b0, paneInfo=PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0}} to final location gs://dbeam-test/tmp/part-00000-of-00001.avro
[error] [direct-runner-worker] INFO org.apache.beam.sdk.io.FileBasedSink - Will remove known temporary file gs://dbeam-test/tmp/.temp-beam-2018-05-24_00-28-22-1/191b3dde-2394-4c6b-a022-5bab0f73dd00
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - Metrics Metrics(0.5.4,2.12.4,JdbcAvroJob,DONE,BeamMetrics(List(BeamMetric(com.spotify.scio.ScioMetrics,schemaElapsedTimeMs,MetricValue(726,Some(726))), BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,writeElapsedMs,MetricValue(7,Some(7))), BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,recordCount,MetricValue(1,Some(1))), BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,executeQueryElapsedMs,MetricValue(9,Some(9)))),List(),List(BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,msPerMillionRows,MetricValue(BeamGauge(7000000,2018-05-24T00:28:23.895Z),Some(BeamGauge(7000000,2018-05-24T00:28:23.895Z)))), BeamMetric(com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter,rowsPerMinute,MetricValue(BeamGauge(8571,2018-05-24T00:28:23.895Z),Some(BeamGauge(8571,2018-05-24T00:28:23.895Z)))))))
[error] [main] INFO com.spotify.dbeam.JdbcAvroJob$ - all counters and gauges Map(MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=rowsPerMinute} -> MetricValue(GaugeResult{value=8571, timestamp=2018-05-24T00:28:23.895Z},Some(GaugeResult{value=8571, timestamp=2018-05-24T00:28:23.895Z})), MetricName{namespace=com.spotify.scio.ScioMetrics, name=schemaElapsedTimeMs} -> MetricValue(726,Some(726)), MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=recordCount} -> MetricValue(1,Some(1)), MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=executeQueryElapsedMs} -> MetricValue(9,Some(9)), MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=writeElapsedMs} -> MetricValue(7,Some(7)), MetricName{namespace=com.spotify.dbeam.JdbcAvroIO.JdbcAvroWriter, name=msPerMillionRows} -> MetricValue(GaugeResult{value=7000000, timestamp=2018-05-24T00:28:23.895Z},Some(GaugeResult{value=7000000, timestamp=2018-05-24T00:28:23.895Z})))
[success] Total time: 12 s, completed May 23, 2018, 5:28:27 PM
sbt:dbeam-core>

got the same error, any suggestions?

If you read the title, you'd know Java 10 is causing the Current ClassLoader error. My solution was to downgrade to Java 8 on the build computer, usually your local development Linux or Mac OS. Google cloud dataflow supports only Java 8. Code built with Java 10 hits errors like this. The steps I took was uninstall Java 10 and all Java JDK, runtime. Make sure nothing Java is left on the build computer; Then install Java 8 latest version.

That seems an issue with Google Dataflow SDK. Maybe open a issue here: https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/ ?

Closing this as Beam SDK does not yet support JDK 9/10/11. Once a Beam SDK version with JDK 11 support is available, DBeam will be upgraded to that version.

See the following for more details:
https://beam.apache.org/roadmap/java-sdk/
https://issues.apache.org/jira/browse/BEAM-2530