acryldata/datahub-helm

"org.apache.kafka.common.errors.UnknownTopicOrPartitionException" during install

agapebondservant opened this issue · 7 comments

Describe the bug
Installing the datahub helm chart now results in an "org.apache.kafka.common.errors.UnknownTopicOrPartitionException" error. (This just started breaking; it worked before)

To Reproduce
Steps to reproduce the behavior:

  1. Install prerequisites: helm install prerequisites datahub/datahub-prerequisite
  2. Install datahub: helm install datahub datahub/datahub

Expected behavior
Datahub should be deployed without issues.

Additional context
Stacktrace:

kubectl logs datahub-kafka-setup-job-mmttq
[main] INFO org.apache.kafka.clients.admin.AdminClientConfig - AdminClientConfig values: 
	bootstrap.servers = [prerequisites-kafka:9092]
	client.dns.lookup = use_all_dns_ips
	client.id = 
	connections.max.idle.ms = 300000
	default.api.timeout.ms = 60000
	metadata.max.age.ms = 300000
	metric.reporters = []
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	receive.buffer.bytes = 65536
	reconnect.backoff.max.ms = 1000
	reconnect.backoff.ms = 50
	request.timeout.ms = 30000
	retries = 2147483647
	retry.backoff.ms = 100
	sasl.client.callback.handler.class = null
	sasl.jaas.config = null
	sasl.kerberos.kinit.cmd = /usr/bin/kinit
	sasl.kerberos.min.time.before.relogin = 60000
	sasl.kerberos.service.name = null
	sasl.kerberos.ticket.renew.jitter = 0.05
	sasl.kerberos.ticket.renew.window.factor = 0.8
	sasl.login.callback.handler.class = null
	sasl.login.class = null
	sasl.login.refresh.buffer.seconds = 300
	sasl.login.refresh.min.period.seconds = 60
	sasl.login.refresh.window.factor = 0.8
	sasl.login.refresh.window.jitter = 0.05
	sasl.mechanism = GSSAPI
	security.protocol = PLAINTEXT
	security.providers = null
	send.buffer.bytes = 131072
	socket.connection.setup.timeout.max.ms = 127000
	socket.connection.setup.timeout.ms = 10000
	ssl.cipher.suites = null
	ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
	ssl.endpoint.identification.algorithm = https
	ssl.engine.factory.class = null
	ssl.key.password = null
	ssl.keymanager.algorithm = SunX509
	ssl.keystore.certificate.chain = null
	ssl.keystore.key = null
	ssl.keystore.location = null
	ssl.keystore.password = null
	ssl.keystore.type = JKS
	ssl.protocol = TLSv1.3
	ssl.provider = null
	ssl.secure.random.implementation = null
	ssl.trustmanager.algorithm = PKIX
	ssl.truststore.certificates = null
	ssl.truststore.location = null
	ssl.truststore.password = null
	ssl.truststore.type = JKS

[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 6.1.4-ccs
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: c9124241a6ff43bc
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1693392754008
/tmp/fifo-LOmw
will start 1
will start 2
will start 3
will start 4
worker 1 started
worker 2 started
worker 3 started
worker 4 started
sending MetadataAuditEvent_v4 --topic MetadataAuditEvent_v4
sending MetadataChangeEvent_v4 --topic MetadataChangeEvent_v4
sending FailedMetadataChangeEvent_v4 --topic FailedMetadataChangeEvent_v4
sending MetadataChangeLog_Versioned_v1 --topic MetadataChangeLog_Versioned_v1
sending MetadataChangeLog_Timeseries_v1 --config retention.ms=7776000000 --topic MetadataChangeLog_Timeseries_v1
sending MetadataChangeProposal_v1 --topic MetadataChangeProposal_v1
sending FailedMetadataChangeProposal_v1 --topic FailedMetadataChangeProposal_v1
sending PlatformEvent_v1 --topic PlatformEvent_v1
sending DataHubUpgradeHistory_v1 config retention.ms=-1 --topic DataHubUpgradeHistory_v1
sending DataHubUsageEvent_v1 --topic DataHubUsageEvent_v1
1 got work_id=MetadataAuditEvent_v4 topic_args=--topic MetadataAuditEvent_v4
2 got work_id=MetadataChangeEvent_v4 topic_args=--topic MetadataChangeEvent_v4
4 got work_id=MetadataChangeLog_Versioned_v1 topic_args=--topic MetadataChangeLog_Versioned_v1
3 got work_id=FailedMetadataChangeEvent_v4 topic_args=--topic FailedMetadataChangeEvent_v4
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic FailedMetadataChangeEvent_v4.
3 got work_id=MetadataChangeLog_Timeseries_v1 topic_args=--config retention.ms=7776000000 --topic MetadataChangeLog_Timeseries_v1
Created topic MetadataChangeEvent_v4.
Created topic MetadataAuditEvent_v4.
Created topic MetadataChangeLog_Versioned_v1.
2 got work_id=MetadataChangeProposal_v1 topic_args=--topic MetadataChangeProposal_v1
1 got work_id=FailedMetadataChangeProposal_v1 topic_args=--topic FailedMetadataChangeProposal_v1
4 got work_id=PlatformEvent_v1 topic_args=--topic PlatformEvent_v1
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic MetadataChangeLog_Timeseries_v1.
Created topic PlatformEvent_v1.
3 got work_id=DataHubUpgradeHistory_v1 topic_args=config retention.ms=-1 --topic DataHubUpgradeHistory_v1
Created topic FailedMetadataChangeProposal_v1.
Created topic MetadataChangeProposal_v1.
4 got work_id=DataHubUsageEvent_v1 topic_args=--topic DataHubUsageEvent_v1
1 done working
2 done working
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic DataHubUpgradeHistory_v1.
3 done working
Created topic DataHubUsageEvent_v1.
4 done working
Topic Creation Complete.
Error while executing config command with args '--command-config /tmp/connection.properties --bootstrap-server prerequisites-kafka:9092 --entity-type topics --entity-name _schemas --alter --add-config cleanup.policy=compact'
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 
	at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
	at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
	at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
	at kafka.admin.ConfigCommand$.getResourceConfig(ConfigCommand.scala:552)
	at kafka.admin.ConfigCommand$.alterConfig(ConfigCommand.scala:322)
	at kafka.admin.ConfigCommand$.processCommand(ConfigCommand.scala:302)
	at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:97)
	at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException:

I upgraded to the latest version of Datahub to see if it would resolve the issues, now I get

2023-08-31 11:49:55,452 [main] ERROR c.l.d.u.s.e.steps.DataHubStartupStep:40 - DataHubStartupStep failed.
org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
Caused by: java.io.IOException: No schema registered under subject!
at io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient.getLatestVersion(MockSchemaRegistryClient.java:261)
at io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient.getLatestSchemaMetadata(MockSchemaRegistryClient.java:310)
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.lookupLatestVersion(AbstractKafkaSchemaSerDe.java:181)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:77)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:59)
at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:902)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:862)
at com.linkedin.metadata.dao.producer.KafkaEventProducer.produceDataHubUpgradeHistoryEvent(KafkaEventProducer.java:171)
at com.linkedin.datahub.upgrade.system.elasticsearch.steps.DataHubStartupStep.lambda$executable$0(DataHubStartupStep.java:37)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:110)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:68)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:42)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(DefaultUpgradeManager.java:33)
at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)
at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65)
2023-08-31 11:49:55,455 [main] ERROR c.l.d.u.s.e.steps.DataHubStartupStep:40 - DataHubStartupStep failed.
org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
Caused by: java.io.IOException: No schema registered under subject!
at io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient.getLatestVersion(MockSchemaRegistryClient.java:261)
at io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient.getLatestSchemaMetadata(MockSchemaRegistryClient.java:310)
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.lookupLatestVersion(AbstractKafkaSchemaSerDe.java:181)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:77)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:59)
at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:902)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:862)
at com.linkedin.metadata.dao.producer.KafkaEventProducer.produceDataHubUpgradeHistoryEvent(KafkaEventProducer.java:171)
at com.linkedin.datahub.upgrade.system.elasticsearch.steps.DataHubStartupStep.lambda$executable$0(DataHubStartupStep.java:37)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:110)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:68)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:42)
at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(DefaultUpgradeManager.java:33)
at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)
at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:65)

I get something similar, I notice that the DataHub Schema registry isn't updating anything

There is a PR to change the default back to cp-schema-registry rather than INTERNAL. I went ahead and made that change in my values.yaml to get around this problem.

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

“org.apache.kafka.common.errors.UnknownTopicOrPartitionException” error typically occurs when a topic or partition doesn’t exist based on possibly stale metadata.

  • Ensure that the external dependencies (Kafka, MySQL, Elasticsearch, Neo4j) are deployed and running before deploying DataHub.

  • After installation, run kubectl get pods to check whether all DataHub pods are running.

  • Inspect the logs of individual pods for any specific error messages

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

This issue was closed because it has been inactive for 30 days since being marked as stale.