streamthoughts/kafka-connect-file-pulse

NoSuchMethodError thrown when using GcsFileSystemListing

Closed this issue · 4 comments

Describe the bug
When we use the GcsFileSystemListing listing class we have a NoSuchMethodError.
Here is the log of the error :

Caused by: java.lang.NoSuchMethodError: 'com.google.common.collect.ImmutableMap com.google.common.collect.ImmutableMap$Builder.buildOrThrow()'
	at com.google.cloud.storage.UnifiedOpts$Opts.getRpcOptions(UnifiedOpts.java:2157)
	at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:383)

To Reproduce
Steps to reproduce the behavior:

  • Make docker-compose up
  • Deploy the following connector :
{
  "connector.class": "io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector",
  "filters": "ParseCSVLine",
  "filters.ParseCSVLine.extract.column.name": "headers",
  "filters.ParseCSVLine.trim.column": "true",
  "filters.ParseCSVLine.separator": ";",
  "filters.ParseCSVLine.type": "io.streamthoughts.kafka.connect.filepulse.filter.CSVFilter",
  "fs.cleanup.policy.class": "io.streamthoughts.kafka.connect.filepulse.fs.clean.LogCleanupPolicy",
  "fs.cleanup.policy.triggered.on": "COMMITTED",
  "fs.listing.class": "io.streamthoughts.kafka.connect.filepulse.fs.GcsFileSystemListing",
  "fs.listing.interval.ms": "10000",
  "gcs.bucket.name": "my-bucket",
  "gcs.credentials.json": "{\"type\": \"authorized_user\", \"audience\": \"//iam.googleapis.com/PROVIDER_ID\", \"auth_url\": \"https://auth.cloud.google/authorize\", \"token_url\": \"https://sts.googleapis.com/v1/oauthtoken\",\"refresh_token\": \"refresh-token\", \"token_info_url\": \"https://sts.googleapis.com/v1/introspect\", \"client_id\": \"fake-client-id\", \"client_secret\": \"fake-client-secret\"}",
  "offset.policy.class": "io.streamthoughts.kafka.connect.filepulse.offset.DefaultSourceOffsetPolicy",
  "skip.headers": "1",
  "topic": "connect-file-pulse-quickstart-gcp-csv-topic",
  "tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.GcsRowFileInputReader",
  "tasks.file.status.storage.class": "io.streamthoughts.kafka.connect.filepulse.state.KafkaFileObjectStateBackingStore",
  "tasks.file.status.storage.bootstrap.servers": "kafka:29092",
  "tasks.file.status.storage.topic": "connect-file-pulse-status",
  "tasks.file.status.storage.topic.partitions": 10,
  "tasks.file.status.storage.topic.replication.factor": 1,
  "tasks.max": 1
}
  • Look at the log in the connect-file-pulse container
  • You should see the error in the logs

Expected behavior
Not having the NoSuchMethodError

Screenshots
No screenshots to provide

Additional context
It seems to be due to a conflict with the guava library.
In the zip of the library (Release v2.13.0 or 2.14.0-early-access) we have the guava-30.1.1-jre.jar version (that does not contain the buildOrThrow method) but we should have the guava:jar:32.1.2-jre.

I've rebuild the library locally and the problem seems to be fixed after the feat(plugin): add parquet file reader commit.

Here are some brief explanation of the dependency tree of maven :

  • Before fixing commit :
[INFO] Building Kafka Connect Source File Pulse Google Cloud Storage FS 2.14.0-SNAPSHOT [11/14]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ kafka-connect-filepulse-google-cloud-storage-fs ---
[INFO] io.streamthoughts:kafka-connect-filepulse-google-cloud-storage-fs:jar:2.14.0-SNAPSHOT
[INFO] +- com.google.cloud:google-cloud-storage:jar:2.27.0:compile
[INFO] |  \- com.google.guava:guava:jar:32.1.2-jre:compile
[INFO] \- com.google.cloud:google-cloud-nio:jar:0.127.6:test
[INFO]    \- (com.google.guava:guava:jar:32.1.2-jre:test - omitted for duplicate)
[INFO] Building Kafka Connect Source File Pulse Plugin 2.14.0-SNAPSHOT  [14/14]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ kafka-connect-filepulse-plugin ---
[INFO] io.streamthoughts:kafka-connect-filepulse-plugin:jar:2.14.0-SNAPSHOT
[INFO] +- io.confluent:kafka-schema-registry-client:jar:7.3.2:test
[INFO] |  \- (com.google.guava:guava:jar:30.1.1-jre:compile - scope updated from test; omitted for duplicate)
[INFO] \- io.streamthoughts:kafka-connect-filepulse-google-cloud-storage-fs:jar:2.14.0-SNAPSHOT:compile
[INFO]    \- com.google.cloud:google-cloud-storage:jar:2.27.0:compile
[INFO]       \- com.google.guava:guava:jar:30.1.1-jre:compile
  • On fixing commit :
[INFO] Building Kafka Connect Source File Pulse Google Cloud Storage FS 2.14.0-SNAPSHOT [11/14]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ kafka-connect-filepulse-google-cloud-storage-fs ---
[INFO] io.streamthoughts:kafka-connect-filepulse-google-cloud-storage-fs:jar:2.14.0-SNAPSHOT
[INFO] +- com.google.cloud:google-cloud-storage:jar:2.27.0:compile
[INFO] |  \- com.google.guava:guava:jar:32.1.2-jre:compile
[INFO] +- io.streamthoughts:kafka-connect-filepulse-commons-fs:jar:2.14.0-SNAPSHOT:compile
[INFO] |  \- org.apache.hadoop:hadoop-common:jar:3.3.6:compile
[INFO] |     \- (com.google.guava:guava:jar:32.1.2-jre:compile - version managed from 27.0-jre; omitted for duplicate)
[INFO] \- com.google.cloud:google-cloud-nio:jar:0.127.6:test
[INFO]    \- (com.google.guava:guava:jar:32.1.2-jre:test - version managed from 27.0-jre; omitted for duplicate)
[INFO] Building Kafka Connect Source File Pulse Plugin 2.14.0-SNAPSHOT  [14/14]
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ kafka-connect-filepulse-plugin ---
[INFO] io.streamthoughts:kafka-connect-filepulse-plugin:jar:2.14.0-SNAPSHOT
[INFO] +- io.confluent:kafka-schema-registry-client:jar:7.3.2:test
[INFO] |  \- (com.google.guava:guava:jar:30.1.1-jre:compile - scope updated from test; omitted for duplicate)
[INFO] +- io.streamthoughts:kafka-connect-filepulse-local-fs:jar:2.14.0-SNAPSHOT:compile
[INFO] |  \- io.streamthoughts:kafka-connect-filepulse-commons-fs:jar:2.14.0-SNAPSHOT:compile
[INFO] |     \- org.apache.hadoop:hadoop-common:jar:3.3.6:compile
[INFO] |        \- (com.google.guava:guava:jar:30.1.1-jre:compile - omitted for conflict with 32.1.2-jre)
[INFO] \- io.streamthoughts:kafka-connect-filepulse-google-cloud-storage-fs:jar:2.14.0-SNAPSHOT:compile
[INFO]    \- com.google.cloud:google-cloud-storage:jar:2.27.0:compile
[INFO]       \- com.google.guava:guava:jar:32.1.2-jre:compile


Could it be possible for you to provide a new version of the connector ?
Thanks.

Hi @tsironneau, a new release is available.

Hi @fhussonnois, thanks for the quick answer.
I'm trying to test the new image with the docker compose but I can't pull it from docker.
I think it's because the image has been published with the jikkou tag instead of kafka-connect-file-pulse here :

tags: streamthoughts/jikkou:${{ env.DOCKER_TAG }}

Is it wanted ?

No, it wasn't planned...that's what happens when you work late at night on opensource ^^. Everything should work now. I've release a bugfix release v2.14.1

Ok, thanks for the update.
I've noticed that the 2.13.0 version is still referenced in the docker-compose.yml, here is a pull request to update to 2.14.1 : #627.
Otherwise everything seems ok to me, we can close the issue 👍, thanks.