signalfx/splunk-otel-collector

Question on custom splunk-otel-collector config. How to pull out portions of the log.file.name and use it to populate sourcetype

hectoralicea opened this issue · 2 comments

We are using a custom splunk otel config. See yaml code template below.

In our example, in the spunk UI we are seeing the index
log.file.path =
is being set as
/splunk-otel/api-name/api-pod-name.log

We want to configure the splunk otel config yaml below to be able to pull directory and file names from the log.file.path and set them in source and source type. Specifically we want to set the following based on the example that log.file.path is set to /splunk-otel/api-name/api-pod-name.log
com.splunk.source: api-name
com.splunk.sourcetype: api-pod-name

How can we set this section

      resource: 
        com.splunk.source: /splunk-otel
        host.name: 'EXPR(env("K8S_NODE_NAME"))'
        com.splunk.sourcetype: otel-pvc-log
receivers:
    filelog:
      include: [ /output/file.log ]
      storage: file_storage/checkpoint
    filelog/mule-logs-volume:
      include: [/splunk-otel/*/*.log]
      start_at: beginning
      include_file_path: true
      include_file_name: true
      resource: 
        com.splunk.source: /splunk-otel
        host.name: 'EXPR(env("K8S_NODE_NAME"))'
        com.splunk.sourcetype: otel-pvc-log
exporters:
    splunk_hec/logs:
        # Splunk HTTP Event Collector token.
        token: "{{ splunk_token }}"
        # URL to a Splunk instance to send data to.
        endpoint: "{{ splunk_full_endpoint }}"
        # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
        source: "output"
        # Splunk index, optional name of the Splunk index targeted.
        index: "{{ splunk_index_name }}"
        # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
        #max_connections: 20
        # Whether to disable gzip compression over HTTP. Defaults to false.
        disable_compression: false
        # HTTP timeout when sending data. Defaults to 10s.
        timeout: 900s
        tls:
          # Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
          # For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
          insecure_skip_verify: true

processors:
    batch:

extensions:
    health_check:
      endpoint: 0.0.0.0:8080
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679
    file_storage/checkpoint:
      directory: /output/
      timeout: 10s
      compaction:
        on_start: true
        directory: /output/
        max_transaction_size: 65_536

service:
    extensions: [pprof, zpages, health_check, file_storage/checkpoint]
    pipelines:
      logs:
        receivers: [filelog/mule-logs-volume]
        processors: [batch]
        exporters: [splunk_hec/logs]

@hectoralicea I think this can be done with a regex_parser operator, specifying the desired attributes as capture groups from the log.file.path attribute: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/regex_parser.md#example-configurations. Please open a support ticket for further assistance: https://splunk.my.site.com/customer/s/need-help/create-case.

Thanks @rmfitzpatrick

I tried the new config below, and I got this error

2024-01-11T22:42:12.013Z	info	service@v0.91.0/telemetry.go:203	Serving Prometheus metrics	{"address": ":8888", "level": "Basic"}
Error: failed to build pipelines: failed to create "filelog/mule-logs-volume" receiver for data type "logs": no named capture groups in regex pattern
2024/01/11 22:42:12 main.go:89: application run finished with error: failed to build pipelines: failed to create "filelog/mule-logs-volume" receiver for data type "logs": no named capture groups in regex pattern

Is the syntax here correct, in the operators section?

receivers:
    filelog:
      include: [ /output/file.log ]
      storage: file_storage/checkpoint
    filelog/mule-logs-volume:
      include: [/splunk-otel/*/*.log]
      start_at: beginning
      include_file_path: true
      include_file_name: true
      resource: 
        com.splunk.source: /splunk-otel
        host.name: 'EXPR(env("K8S_NODE_NAME"))'
        com.splunk.sourcetype: otel-pvc-log
      operators:
      # Extract metadata from file path
      - type: regex_parser
        id: extract_metadata_from_filepath
        regex: '^.*$'
        parse_from: attributes["log.file.path"]
exporters:
    splunk_hec/logs:
        # Splunk HTTP Event Collector token.
        token: "{{ splunk_token }}"
        # URL to a Splunk instance to send data to.
        endpoint: "{{ splunk_full_endpoint }}"
        # Optional Splunk source: https://docs.splunk.com/Splexicon:Source
        source: "output"
        # Splunk index, optional name of the Splunk index targeted.
        index: "{{ splunk_index_name }}"
        # Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
        #max_connections: 20
        # Whether to disable gzip compression over HTTP. Defaults to false.
        disable_compression: false
        # HTTP timeout when sending data. Defaults to 10s.
        timeout: 900s
        tls:
          # Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
          # For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
          insecure_skip_verify: true

processors:
    batch:

extensions:
    health_check:
      endpoint: 0.0.0.0:8080
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679
    file_storage/checkpoint:
      directory: /output/
      timeout: 10s
      compaction:
        on_start: true
        directory: /output/
        max_transaction_size: 65_536

service:
    extensions: [pprof, zpages, health_check, file_storage/checkpoint]
    pipelines:
      logs:
        receivers: [filelog/mule-logs-volume]
        processors: [batch]
        exporters: [splunk_hec/logs]