Bigtable from Dataproc: Dependency conflict even after shading the jars
shril opened this issue · 2 comments
I am trying to run a Spark Application to write and read data to Cloud Bigtable from Dataproc.
Initially, I got this exception java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument
. Then came to know that there are some dependency issues from this Google Documentation - Manage Java and Scala dependencies for Apache Spark.
Following the instructions, I changed my build.sbt
file to shade the jars -
assembly / assemblyShadeRules := Seq(
ShadeRule.rename("com.google.common.**" -> "repackaged.com.google.common.@1").inAll,
ShadeRule.rename("com.google.protobuf.**" -> "repackaged.com.google.protobuf.@1").inAll,
ShadeRule.rename("io.grpc.**" -> "repackaged.io.grpc.@1").inAll
)
Then got this error
repackaged.io.grpc.ManagedChannelProvider$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact
at repackaged.io.grpc.ManagedChannelProvider.provider(ManagedChannelProvider.java:45)
at repackaged.io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:353)
at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:107)
at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:85)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:237)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:231)
at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:201)
at com.google.cloud.bigtable.data.v2.stub.EnhancedBigtableStub.create(EnhancedBigtableStub.java:175)
at com.google.cloud.bigtable.data.v2.BigtableDataClient.create(BigtableDataClient.java:165)
at com.groupon.crm.BigtableClient$.getDataClient(BigtableClient.scala:59)
... 44 elided
Following that, I added the dependency of in my build.sbt
file.
libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"
Still I am getting the same error.
Environment details
Dataproc details -
"software_config": {
"image_version": "1.5-debian10",
"properties": {
"dataproc:dataproc.logging.stackdriver.job.driver.enable": "true",
"dataproc:dataproc.logging.stackdriver.enable": "true",
"dataproc:jobs.file-backed-output.enable": "true",
"dataproc:dataproc.logging.stackdriver.job.yarn.container.enable": "true",
"capacity-scheduler:yarn.scheduler.capacity.resource-calculator" : "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator",
"hive:hive.server2.materializedviews.cache.at.startup": "false",
"spark:spark.jars":"XXXX"
},
"optional_components": ["ZEPPELIN","ANACONDA","JUPYTER"]
}
Spark Job details -
val sparkVersion = "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % sparkVersion % "provided"
libraryDependencies += "com.google.cloud" % "google-cloud-bigtable" % "2.23.1"
libraryDependencies += "com.google.auth" % "google-auth-library-oauth2-http" % "1.17.0"
libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"
Can provide any additional details if required?
Thanks!
You need to make sure that you are updating the service files when repackaging grpc. In Maven you would use something like:
https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer
However I'm uncertain whats the equivalent for sbt, but the resulting shaded jar needs to have a META-INF/services files that are correctly updated to with the repackaged class names
@igorbernstein2 I did the following and it worked perfectly well for me.
In src/main/resources
I added META-INF/services
folder.
In the services
folder I added 2 files namely
io.grpc.LoadBalancerProvider
io.grpc.NameResolverProvider
The content of both the files are as follows -
io.grpc.LoadBalancerProvider
io.grpc.internal.PickFirstLoadBalancerProvider
io.grpc.util.SecretRoundRobinLoadBalancerProvider$Provider
io.grpc.util.OutlierDetectionLoadBalancerProvider
and io.grpc.NameResolverProvider
io.grpc.internal.DnsNameResolverProvider