googleapis/java-bigtable

Bigtable from Dataproc: Dependency conflict even after shading the jars

shril opened this issue · 2 comments

shril commented

I am trying to run a Spark Application to write and read data to Cloud Bigtable from Dataproc.

Initially, I got this exception java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument. Then came to know that there are some dependency issues from this Google Documentation - Manage Java and Scala dependencies for Apache Spark.

Following the instructions, I changed my build.sbt file to shade the jars -

assembly / assemblyShadeRules := Seq(
  ShadeRule.rename("com.google.common.**" -> "repackaged.com.google.common.@1").inAll,
  ShadeRule.rename("com.google.protobuf.**" -> "repackaged.com.google.protobuf.@1").inAll,
  ShadeRule.rename("io.grpc.**" -> "repackaged.io.grpc.@1").inAll
)

Then got this error

repackaged.io.grpc.ManagedChannelProvider$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact
  at repackaged.io.grpc.ManagedChannelProvider.provider(ManagedChannelProvider.java:45)
  at repackaged.io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:39)
  at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:353)
  at com.google.api.gax.grpc.ChannelPool.<init>(ChannelPool.java:107)
  at com.google.api.gax.grpc.ChannelPool.create(ChannelPool.java:85)
  at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:237)
  at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:231)
  at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:201)
  at com.google.cloud.bigtable.data.v2.stub.EnhancedBigtableStub.create(EnhancedBigtableStub.java:175)
  at com.google.cloud.bigtable.data.v2.BigtableDataClient.create(BigtableDataClient.java:165)
  at com.groupon.crm.BigtableClient$.getDataClient(BigtableClient.scala:59)
  ... 44 elided

Following that, I added the dependency of in my build.sbt file.

libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"

Still I am getting the same error.

Environment details

Dataproc details -

"software_config": {
      "image_version": "1.5-debian10",
      "properties": {
        "dataproc:dataproc.logging.stackdriver.job.driver.enable": "true",
        "dataproc:dataproc.logging.stackdriver.enable": "true",
        "dataproc:jobs.file-backed-output.enable": "true",
        "dataproc:dataproc.logging.stackdriver.job.yarn.container.enable": "true",
        "capacity-scheduler:yarn.scheduler.capacity.resource-calculator" : "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator",
        "hive:hive.server2.materializedviews.cache.at.startup": "false",
        "spark:spark.jars":"XXXX"
      },
      "optional_components": ["ZEPPELIN","ANACONDA","JUPYTER"]
    }

Spark Job details -

val sparkVersion = "2.4.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % "provided"
libraryDependencies +=  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided"
libraryDependencies +=  "org.apache.spark" %% "spark-hive" % sparkVersion % "provided"
libraryDependencies += "com.google.cloud" % "google-cloud-bigtable" % "2.23.1"
libraryDependencies += "com.google.auth" % "google-auth-library-oauth2-http" % "1.17.0"
libraryDependencies += "io.grpc" % "grpc-netty" % "1.49.2"

Can provide any additional details if required?
Thanks!

You need to make sure that you are updating the service files when repackaging grpc. In Maven you would use something like:
https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer

However I'm uncertain whats the equivalent for sbt, but the resulting shaded jar needs to have a META-INF/services files that are correctly updated to with the repackaged class names

shril commented

@igorbernstein2 I did the following and it worked perfectly well for me.

In src/main/resources I added META-INF/services folder.
In the services folder I added 2 files namely

  • io.grpc.LoadBalancerProvider
  • io.grpc.NameResolverProvider

The content of both the files are as follows -

io.grpc.LoadBalancerProvider

io.grpc.internal.PickFirstLoadBalancerProvider
io.grpc.util.SecretRoundRobinLoadBalancerProvider$Provider
io.grpc.util.OutlierDetectionLoadBalancerProvider

and io.grpc.NameResolverProvider

io.grpc.internal.DnsNameResolverProvider