acryldata/datahub-helm

datahub-kafka-setup-job could not connect to the external kafka

GorSarg opened this issue · 2 comments

We have external kafka, zookeper, and mysql clusters that we want to connect to datahub services. We enabled kafka cluster authentication.

  interBrokerProtocol: plaintext
  sasl:
    mechanisms: plain,scram-sha-256,scram-sha-512

With these error messages, datahub-kafka-setup-job was unable to connect to kafka.

org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
	at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:553)
	at org.apache.kafka.clients.admin.Admin.create(Admin.java:144)
	at org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:49)
	at io.confluent.admin.utils.ClusterStatus.isKafkaReady(ClusterStatus.java:136)
	at io.confluent.admin.utils.cli.KafkaReadyCommand.main(KafkaReadyCommand.java:149)
Caused by: java.lang.IllegalArgumentException: Login module not specified in JAAS config

1 got work_id=MetadataChangeProposal_v1 topic_args=--partitions 1 --topic MetadataChangeProposal_v1
4 got work_id=FailedMetadataChangeProposal_v1 topic_args=--partitions 1 --topic FailedMetadataChangeProposal_v1
Exception in thread "main" org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
	at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:551)
	at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:488)
	at org.apache.kafka.clients.admin.Admin.create(Admin.java:134)
	at kafka.admin.TopicCommand$TopicService$.createAdminClient(TopicCommand.scala:205)
	at kafka.admin.TopicCommand$TopicService$.apply(TopicCommand.scala:209)
	at kafka.admin.TopicCommand$.main(TopicCommand.scala:50)
	at kafka.admin.TopicCommand.main(TopicCommand.scala)
Caused by: java.lang.IllegalArgumentException: Login module not specified in JAAS config

my value.yaml file is

# Copy this chart and change configuration as needed.
datahub-gms:
  enabled: true
  image:
    repository: linkedin/datahub-gms
    # tag: "v0.10.0 # defaults to .global.datahub.version
  resources:
    limits:
      memory: 2Gi
    requests:
      cpu: 100m
      memory: 1Gi

datahub-frontend:
  enabled: true
  image:
    repository: linkedin/datahub-frontend-react
    # tag: "v0.10.0" # # defaults to .global.datahub.version
  resources:
    limits:
      memory: 1400Mi
    requests:
      cpu: 100m
      memory: 512Mi
  # Set up ingress to expose react front-end
  ingress:
    enabled: false
  defaultUserCredentials:
  #  randomAdminPassword: true
  #  # You can also set specific passwords for default users
    manualValues: |
     datahub:manualPassword
     initialViewer:manualPassword

acryl-datahub-actions:
  enabled: true
  image:
    repository: acryldata/datahub-actions
    tag: "v0.0.11"
  resources:
    limits:
      memory: 512Mi
    requests:
      cpu: 300m
      memory: 256Mi

datahub-mae-consumer:
  image:
    repository: linkedin/datahub-mae-consumer
    # tag: "v0.10.0" # defaults to .global.datahub.version
  resources:
    limits:
      memory: 1536Mi
    requests:
      cpu: 100m
      memory: 256Mi

datahub-mce-consumer:
  image:
    repository: linkedin/datahub-mce-consumer
    # tag: "v0.10.0" # defaults to .global.datahub.version
  resources:
    limits:
      memory: 1536Mi
    requests:
      cpu: 100m
      memory: 256Mi

datahub-ingestion-cron:
  enabled: false
  image:
    repository: acryldata/datahub-ingestion
    # tag: "v0.10.0" # defaults to .global.datahub.version

elasticsearchSetupJob:
  enabled: true
  ....

kafkaSetupJob:
  enabled: true
  image:
    repository: linkedin/datahub-kafka-setup
    # tag: "v0.10.0" # defaults to .global.datahub.version
  resources:
    limits:
      cpu: 500m
      memory: 1024Mi
    requests:
      cpu: 300m
      memory: 768Mi
  extraInitContainers: []
  podSecurityContext:
    fsGroup: 1000
  securityContext:
    runAsUser: 1000
  annotations:
    # This is what defines this resource as a hook. Without this line, the
    # job is considered part of the release.
    helm.sh/hook: pre-install,pre-upgrade
    helm.sh/hook-weight: "-5"
    helm.sh/hook-delete-policy: before-hook-creation
  podAnnotations: {}
  # Add extra sidecar containers to job pod
  extraSidecars: []
  extraVolumeMounts:
    - name: config-volume
      mountPath: /opt/kafka/test
  extraVolumes:
    - name: config-volume
      configMap:
        name: kafka-client


mysqlSetupJob:
  enabled: true
  ...
postgresqlSetupJob:
  enabled: false
  ....

## No code data migration
datahubUpgrade:
  enabled: true
  image:
    repository: acryldata/datahub-upgrade
    # tag: "v0.10.0"  # defaults to .global.datahub.version
  batchSize: 1000
  batchDelayMs: 100
  noCodeDataMigration:
    sqlDbType: "MYSQL"
    # sqlDbType: "POSTGRES"
  podSecurityContext: {}
    # fsGroup: 1000
  securityContext: {}
    # runAsUser: 1000
  annotations:
    # This is what defines this resource as a hook. Without this line, the
    # job is considered part of the release.
    helm.sh/hook: post-install,post-upgrade
    helm.sh/hook-weight: "-2"
    helm.sh/hook-delete-policy: before-hook-creation
  podAnnotations: {}
  # Add extra sidecar containers to job pod
  extraSidecars: []
  cleanupJob:
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 300m
        memory: 256Mi
    # Add extra sidecar containers to job pod
    extraSidecars: []
  restoreIndices:
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 300m
        memory: 256Mi
    # Add extra sidecar containers to job pod
    extraSidecars: []
  extraInitContainers: []

## Runs system update processes
## Includes: Elasticsearch Indices Creation/Reindex (See global.elasticsearch.index for additional configuration)
datahubSystemUpdate:
  image:
    repository: acryldata/datahub-upgrade
    # tag:
  podSecurityContext: {}
    # fsGroup: 1000
  securityContext: {}
    # runAsUser: 1000
  annotations:
    # This is what defines this resource as a hook. Without this line, the
    # job is considered part of the release.
    helm.sh/hook: pre-install,pre-upgrade
    helm.sh/hook-weight: "-4"
    helm.sh/hook-delete-policy: before-hook-creation
  podAnnotations: {}
  resources:
    limits:
      cpu: 500m
      memory: 512Mi
    requests:
      cpu: 300m
      memory: 256Mi
  # Add extra sidecar containers to job pod
  extraSidecars: []
    # - name: my-image-name
    #   image: my-image
    #   imagePullPolicy: Always
  extraInitContainers: []

global:
  strict_mode: true
  graph_service_impl: elasticsearch
  datahub_analytics_enabled: true
  datahub_standalone_consumers_enabled: false

  elasticsearch:
    host: "elasticsearch-master"
    port: "9200"
    skipcheck: "false"
    insecure: "false"
    useSSL: "false"
    index:
      enableMappingsReindex: true
      enableSettingsReindex: true
      upgrade:
        cloneIndices: true
        allowDocCountMismatch: false

    ## Search related configuration
    search:
      ## Maximum terms in aggregations
      maxTermBucketSize: 20

      ## Configuration around exact matching for search
      exactMatch:
        ## if false will only apply weights, if true will exclude non-exact
        exclusive: false
        ## include prefix exact matches
        withPrefix: true
        ## boost multiplier when exact with case
        exactFactor: 2.0
        ## boost multiplier when exact prefix
        prefixFactor: 1.6
        ## stacked boost multiplier when case mismatch
        caseSensitivityFactor: 0.7
        ## enable exact match on structured search
        enableStructured: true

      ## Configuration for graph service dao
      graph:
        ## graph dao timeout seconds
        timeoutSeconds: 50
        ## graph dao batch size
        batchSize: 1000
        ## graph dao max result size
        maxResult: 10000

      custom:
        enabled: false
        # See documentation: https://datahubproject.io/docs/how/search/#customizing-search
        config:
          # Notes:
          #
          # First match wins
          #
          # queryRegex = Java regex syntax
          #
          # functionScores - See the following for function score syntax
          # https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-function-score-query.html

          queryConfigurations:
            # Select *
            - queryRegex: '[*]|'
              simpleQuery: false
              prefixMatchQuery: false
              exactMatchQuery: false
              boolQuery:
                must_not:
                  term:
                    deprecated:
                      value: true
              functionScore:
                functions:
                  - filter:
                      term:
                        materialized:
                          value: true
                    weight: 0.8
                score_mode: multiply
                boost_mode: multiply

            # Criteria for exact-match only
            # Contains quoted or contains underscore then use exact match query
            - queryRegex: >-
                ["'].+["']|\S+_\S+
              simpleQuery: false
              prefixMatchQuery: true
              exactMatchQuery: true
              functionScore:
                functions:
                  - filter:
                      term:
                        materialized:
                          value: true
                    weight: 0.8
                  - filter:
                      term:
                        deprecated:
                          value: true
                    weight: 0
                score_mode: multiply
                boost_mode: multiply
            # default
            - queryRegex: .*
              simpleQuery: true
              prefixMatchQuery: true
              exactMatchQuery: true
              boolQuery:
                must_not:
                  term:
                    deprecated:
                      value: true
              functionScore:
                functions:
                  - filter:
                      term:
                        materialized:
                          value: true
                    weight: 0.8
                score_mode: multiply
                boost_mode: multiply

  kafka:
    bootstrap:
      server: "kafka-0.kafka-headless.kafka.svc.cluster.local:9092,kafka-1.kafka-headless.kafka.svc.cluster.local:9092,kafka-2.kafka-headless.kafka.svc.cluster.local:9092"
    zookeeper:
      server: "zookeeper-0.zookeeper-headless.kafka.svc.cluster.local, zookeeper-1.zookeeper-headless.kafka.svc.cluster.local, zookeeper-2.zookeeper-headless.kafka.svc.cluster.local"
    # This section defines the names for the kafka topics that DataHub depends on, at a global level. Do not override this config
    # at a sub-chart level.
    topics:
      metadata_change_event_name: "MetadataChangeEvent_v4"
      failed_metadata_change_event_name: "FailedMetadataChangeEvent_v4"
      metadata_audit_event_name: "MetadataAuditEvent_v4"
      datahub_usage_event_name: "DataHubUsageEvent_v1"
      metadata_change_proposal_topic_name: "MetadataChangeProposal_v1"
      failed_metadata_change_proposal_topic_name: "FailedMetadataChangeProposal_v1"
      metadata_change_log_versioned_topic_name: "MetadataChangeLog_Versioned_v1"
      metadata_change_log_timeseries_topic_name: "MetadataChangeLog_Timeseries_v1"
      platform_event_topic_name: "PlatformEvent_v1"
      datahub_upgrade_history_topic_name: "DataHubUpgradeHistory_v1"


  neo4j:
    host: "prerequisites-neo4j-community:7474"
    uri: "bolt://prerequisites-neo4j-community"
    username: "neo4j"
    password:
      secretRef: neo4j-secrets
      secretKey: neo4j-password
    # --------------OR----------------
    # value: password

  sql:
    datasource:
      host: "prerequisites-mysql:3306"
      hostForMysqlClient: "prerequisites-mysql"
      port: "3306"
      url: "jdbc:mysql://prerequisites-mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8&enabledTLSProtocols=TLSv1.2"
      driver: "com.mysql.cj.jdbc.Driver"
      username: "root"
      password:
        secretRef: mysql-secrets
        secretKey: mysql-root-password

  datahub:
    version: v0.10.5
    gms:
      port: "8080"
      nodePort: "30001"

    monitoring:
      enablePrometheus: true

    mae_consumer:
      port: "9091"
      nodePort: "30002"

    appVersion: "1.0"
    systemUpdate:
      ## The following options control settings for datahub-upgrade job which will
      ## managed ES indices and other update related work
      enabled: true

    encryptionKey:
      secretRef: "datahub-encryption-secrets"
      secretKey: "encryption_key_secret"
      # Set to false if you'd like to provide your own secret.
      provisionSecret:
        enabled: true
        autoGenerate: true
        annotations: {}

    managed_ingestion:
      enabled: true
      defaultCliVersion: "0.10.5.4"

    metadata_service_authentication:
      enabled: false
      systemClientId: "__datahub_system"
      systemClientSecret:
        secretRef: "datahub-auth-secrets"
        secretKey: "system_client_secret"
      tokenService:
        signingKey:
          secretRef: "datahub-auth-secrets"
          secretKey: "token_service_signing_key"
        salt:
          secretRef: "datahub-auth-secrets"
          secretKey: "token_service_salt"
      # Set to false if you'd like to provide your own auth secrets
      provisionSecrets:
        enabled: true
        autoGenerate: true
        annotations: {}


    ## Enables always emitting a MCL even when no changes are detected. Used for Time Based Lineage when no changes occur.
    alwaysEmitChangeLog: false

    ## Enables diff mode for graph writes, uses a different code path that produces a diff from previous to next to write relationships instead of wholesale deleting edges and reading
    enableGraphDiffMode: true

    ## Values specific to the unified search and browse feature.
    search_and_browse:
      show_search_v2: false  # If on, show the new search filters experience as of v0.10.5
      show_browse_v2: false  # If on, show the new browse experience as of v0.10.5
      backfill_browse_v2: false  # If on, run the backfill upgrade job that generates default browse paths for relevant entities


  springKafkaConfigurationOverrides:
    security.protocol: SASL_PLAINTEXT
    sasl.mechanism: SCRAM-SHA-256

I tried to pass sasl.jaas.config like this

    security.protocol: SASL_PLAINTEXT
    sasl.mechanism: SCRAM-SHA-256
    sasl.jaas.config: "/opt/kafka/test/kafka_jaas.conf"

I tried to pass as an environment variable (by changing existing templates)

...
        env:
            - name: KAFKA_OPTS
              value: "-Djava.security.auth.login.config=/opt/kafka/test/kafka_jaas.conf"
              .... 

but every time I am facing the problem mentioned above.

this file /opt/kafka/test/kafka_jaas.conf I've added mannually. the content of file is

KafkaClient {
org.apache.kafka.common.security.scram.ScramLoginModule required
username="admin"
password="xxxxxxx";
};

the conent of /tmp/connection.properties is

bootstrap.servers=kafka-0.kafka-headless.kafka.svc.cluster.local:9092,kafka-1.kafka-headless.kafka.svc.cluster.local:9092,kafka-2.kafka-headless.kafka.svc.cluster.local:9092
security.protocol=SASL_PLAINTEXT
sasl.mechanism=SCRAM-SHA-256
#sasl.jaas.config=/opt/kafka/test/kafka_jaas.conf (when the property  .Values....springKafkaConfigurationOverrides.sasl.jaas.config is exist)
sasl.kerberos.service.name=

could you help me with that?what am I doing wrong?how can I connect to the kafka with username and password?

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

This issue was closed because it has been inactive for 30 days since being marked as stale.