signalfx/splunk-otel-collector

Question on operators router and parsing metadata out of log.file.path

hectoralicea opened this issue · 3 comments

I'm having issues getting parsing working using a custom config otel specification. The log.file.path should be one of these two formats:

  1. /splunk-otel/app-api-starter-project-template/app-api-starter-project-template-96bfdf8866-9jz7m/app-api-starter-project-template.log
  2. /splunk-otel/app-api-starter-project-template/app-api-starter-project-template.log

One with and one without the pod name.

We are doing it this way so that we only index one application log file in a set of directories rather than picking up a ton of kubernetes logs that we will never review, but yet have to store.

At the bottom is the full otel config.

We are noticing that regardless of the file path (1 or 2) above, it keeps going to the default option, and in the catchall attribute in splunk, it has the value of log.file.path which always is the 1st format above (e.g. /splunk-otel/app-api-starter-project-template/app-api-starter-project-template-96bfdf8866-9jz7m/app-api-starter-project-template.log).

      - id: catchall
        type: move
        from: attributes["log.file.path"]
        to: attributes["catchall"]

Why is it that it's not going to the route parse-deep-filepath considering the Regex should match.

We want to be able to pull out the application name, the pod name, and the namespace which are all reflected in the full log.file.path

receivers:
    filelog/mule-logs-volume:
      include: 
      - /splunk-otel/*/app*.log
      - /splunk-otel/*/*/app*.log
      start_at: beginning
      include_file_path: true
      include_file_name: true
      resource: 
        com.splunk.sourcetype: mule-logs
        k8s.cluster.name: {{ k8s_cluster_instance_name }}
        deployment.environment: {{ aws_environment_name }}
        splunk_server: {{ splunk_host }}
      operators:
      - type: router
        id: get-format
        routes:
          - output: parse-deep-filepath
            expr: 'log.file.path matches "^/splunk-otel/[^/]+/[^/]+/app-[^/]+[.]log$"'
          - output: parse-shallow-filepath
            expr: 'log.file.path matches "^/splunk-otel/[^/]+/app-[^/]+[.]log$"'
          - output: nil-filepath
            expr: 'log.file.path matches "^<nil>$"'
        default: catchall
      # Extract metadata from file path
      - id: parse-deep-filepath
        type: regex_parser
        regex: '^/splunk-otel/(?P<namespace>[^/]+)/(?P<pod_name>[^/]+)/(?P<application>[^/]+)[.]log$'
        parse_from: attributes["log.file.path"]
      - id: parse-shallow-filepath
        type: regex_parser
        regex: '^/splunk-otel/(?P<namespace>[^/]+)/(?P<application>[^/]+)[.]log$'
        parse_from: attributes["log.file.path"]
      - id: nil-filepath
        type: move
        from: attributes["log.file.path"]
        to: attributes["nil_filepath"]
      - id: catchall
        type: move
        from: attributes["log.file.path"]
        to: attributes["catchall"]

exporters:
    splunk_hec/logs:
        # Splunk HTTP Event Collector token.
        token: "{{ splunk_token }}"
        # URL to a Splunk instance to send data to.
        endpoint: "{{ splunk_full_endpoint }}"
        # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
        source: "output"
        # Splunk index, optional name of the Splunk index targeted.
        index: "{{ splunk_index_name }}"
        # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
        #max_connections: 20
        # Whether to disable gzip compression over HTTP. Defaults to false.
        disable_compression: false
        # HTTP timeout when sending data. Defaults to 10s.
        timeout: 900s
        tls:
          # Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
          # For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
          insecure_skip_verify: true

processors:
    batch:

extensions:
    health_check:
      endpoint: 0.0.0.0:8080
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679
    file_storage/checkpoint:
      directory: /output/
      timeout: 10s
      compaction:
        on_start: true
        directory: /output/
        max_transaction_size: 65_536

service:
    extensions: [pprof, zpages, health_check, file_storage/checkpoint]
    pipelines:
      logs:
        receivers: [filelog/mule-logs-volume]
        processors: [batch]
        exporters: [splunk_hec/logs]

Please open a support case. Issues are no longer open.

As crazy as it sounds, our organization has not purchased splunk support.

Understood. Please reach out to me using the email in this zerobin and we can sort this out:
https://zerobin.org/?48425c91cfd81ed7#HJnYVNF3aNBp9BP8zqsfmwpmpygsXqD3xdfbZ4ekmKHf

Thanks!