strimzi/test-container

Generated cluster id for kraft mode is sometimes invalid

Closed this issue · 10 comments

When running a container in kraft mode, I sometimes (approximately once in 100 runs) see the container fail with status 1 and the following error in the logs:

2024-07-04T15:24:09.5338779Z usage: kafka-storage format [-h] --config CONFIG --cluster-id CLUSTER_ID
2024-07-04T15:24:09.5339351Z                      [--add-scram ADD_SCRAM] [--ignore-formatted]
2024-07-04T15:24:09.5339892Z                      [--release-version RELEASE_VERSION]
2024-07-04T15:24:09.5340442Z kafka-storage: error: argument --cluster-id/-t: expected one argument
2024-07-04T15:24:09.5341500Z [2024-07-04 15:19:11,430] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
2024-07-04T15:24:09.5343120Z [2024-07-04 15:19:11,654] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
2024-07-04T15:24:09.5344263Z [2024-07-04 15:19:11,722] ERROR Exiting Kafka due to fatal exception (kafka.Kafka$)
2024-07-04T15:24:09.5344897Z java.lang.RuntimeException: No readable meta.properties files found.
2024-07-04T15:24:09.5345714Z 	at org.apache.kafka.metadata.properties.MetaPropertiesEnsemble.verify(MetaPropertiesEnsemble.java:493)
2024-07-04T15:24:09.5346567Z 	at kafka.server.KafkaRaftServer$.initializeLogDirs(KafkaRaftServer.scala:152)
2024-07-04T15:24:09.5347197Z 	at kafka.server.KafkaRaftServer.<init>(KafkaRaftServer.scala:60)
2024-07-04T15:24:09.5347787Z 	at kafka.Kafka$.buildServer(Kafka.scala:82)
2024-07-04T15:24:09.5348162Z 	at kafka.Kafka$.main(Kafka.scala:90)
2024-07-04T15:24:09.5348486Z 	at kafka.Kafka.main(Kafka.scala)
2024-07-04T15:24:09.5348701Z 

My investigation showed, that this may happen when generated cluster id[1] looks like this: -Mk3vxQVTc-iWEuxW2zREA. It looks like custom uuid generation code sometimes create invalid values.

I suggest the following:

  1. Add set -euv to the generated bash script(/testcontainers_start.sh), so this code can fail early and users will more info dfor debugging
  2. Fix the custom uuid generated code or replace its usage with a call to the existing kafka code ("bin/kafka-storage.sh random-uuid")

[1] https://github.com/strimzi/test-container/blob/main/src/main/java/io/strimzi/test/container/StrimziKafkaContainer.java#L280

@see-quick Could this be the same as here: strimzi/strimzi-kafka-operator#9301?

@see-quick Could this be the same as here: strimzi/strimzi-kafka-operator#9301?

Yeah, it seems like that. Let me fix it in the code.

I have created a PR [1]

[1] - #73

@see-quick is there a planned date for 0.107.0 release?

@scholzj do you have any plans for 0.107 release? We are considering if we should wait for the release or disable the test for now.

@see-quick Can we do a new release to fix this?

@scholzj do you have any plans for 0.107 release? We are considering if we should wait for the release or disable the test for now.

Hi, I am gonna do an RC today.

@see-quick thank you, our tests works with pre-released 0.107.0-rc1. When can we expect release?

@gtroitsk He is off for the rest of this week. So the GA will likely happen next week.

Hi @gtroitsk, yesterday I have released the 0.107.0 version of strimzi test container, which contains a fix.