Question on custom splunk-otel-collector config. How to pull out portions of the log.file.name and use it to populate sourcetype
hectoralicea opened this issue · 2 comments
We are using a custom splunk otel config. See yaml code template below.
In our example, in the spunk UI we are seeing the index
log.file.path =
is being set as
/splunk-otel/api-name/api-pod-name.log
We want to configure the splunk otel config yaml below to be able to pull directory and file names from the log.file.path and set them in source and source type. Specifically we want to set the following based on the example that log.file.path
is set to /splunk-otel/api-name/api-pod-name.log
com.splunk.source: api-name
com.splunk.sourcetype: api-pod-name
How can we set this section
resource:
com.splunk.source: /splunk-otel
host.name: 'EXPR(env("K8S_NODE_NAME"))'
com.splunk.sourcetype: otel-pvc-log
receivers:
filelog:
include: [ /output/file.log ]
storage: file_storage/checkpoint
filelog/mule-logs-volume:
include: [/splunk-otel/*/*.log]
start_at: beginning
include_file_path: true
include_file_name: true
resource:
com.splunk.source: /splunk-otel
host.name: 'EXPR(env("K8S_NODE_NAME"))'
com.splunk.sourcetype: otel-pvc-log
exporters:
splunk_hec/logs:
# Splunk HTTP Event Collector token.
token: "{{ splunk_token }}"
# URL to a Splunk instance to send data to.
endpoint: "{{ splunk_full_endpoint }}"
# Optional Splunk source: https://docs.splunk.com/Splexicon:Source
source: "output"
# Splunk index, optional name of the Splunk index targeted.
index: "{{ splunk_index_name }}"
# Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
#max_connections: 20
# Whether to disable gzip compression over HTTP. Defaults to false.
disable_compression: false
# HTTP timeout when sending data. Defaults to 10s.
timeout: 900s
tls:
# Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
# For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
insecure_skip_verify: true
processors:
batch:
extensions:
health_check:
endpoint: 0.0.0.0:8080
pprof:
endpoint: :1888
zpages:
endpoint: :55679
file_storage/checkpoint:
directory: /output/
timeout: 10s
compaction:
on_start: true
directory: /output/
max_transaction_size: 65_536
service:
extensions: [pprof, zpages, health_check, file_storage/checkpoint]
pipelines:
logs:
receivers: [filelog/mule-logs-volume]
processors: [batch]
exporters: [splunk_hec/logs]
@hectoralicea I think this can be done with a regex_parser
operator, specifying the desired attributes as capture groups from the log.file.path
attribute: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/regex_parser.md#example-configurations. Please open a support ticket for further assistance: https://splunk.my.site.com/customer/s/need-help/create-case.
Thanks @rmfitzpatrick
I tried the new config below, and I got this error
2024-01-11T22:42:12.013Z info service@v0.91.0/telemetry.go:203 Serving Prometheus metrics {"address": ":8888", "level": "Basic"}
Error: failed to build pipelines: failed to create "filelog/mule-logs-volume" receiver for data type "logs": no named capture groups in regex pattern
2024/01/11 22:42:12 main.go:89: application run finished with error: failed to build pipelines: failed to create "filelog/mule-logs-volume" receiver for data type "logs": no named capture groups in regex pattern
Is the syntax here correct, in the operators section?
receivers:
filelog:
include: [ /output/file.log ]
storage: file_storage/checkpoint
filelog/mule-logs-volume:
include: [/splunk-otel/*/*.log]
start_at: beginning
include_file_path: true
include_file_name: true
resource:
com.splunk.source: /splunk-otel
host.name: 'EXPR(env("K8S_NODE_NAME"))'
com.splunk.sourcetype: otel-pvc-log
operators:
# Extract metadata from file path
- type: regex_parser
id: extract_metadata_from_filepath
regex: '^.*$'
parse_from: attributes["log.file.path"]
exporters:
splunk_hec/logs:
# Splunk HTTP Event Collector token.
token: "{{ splunk_token }}"
# URL to a Splunk instance to send data to.
endpoint: "{{ splunk_full_endpoint }}"
# Optional Splunk source: https://docs.splunk.com/Splexicon:Source
source: "output"
# Splunk index, optional name of the Splunk index targeted.
index: "{{ splunk_index_name }}"
# Maximum HTTP connections to use simultaneously when sending data. Defaults to 100.
#max_connections: 20
# Whether to disable gzip compression over HTTP. Defaults to false.
disable_compression: false
# HTTP timeout when sending data. Defaults to 10s.
timeout: 900s
tls:
# Whether to skip checking the certificate of the HEC endpoint when sending data over HTTPS. Defaults to false.
# For this demo, we use a self-signed certificate on the Splunk docker instance, so this flag is set to true.
insecure_skip_verify: true
processors:
batch:
extensions:
health_check:
endpoint: 0.0.0.0:8080
pprof:
endpoint: :1888
zpages:
endpoint: :55679
file_storage/checkpoint:
directory: /output/
timeout: 10s
compaction:
on_start: true
directory: /output/
max_transaction_size: 65_536
service:
extensions: [pprof, zpages, health_check, file_storage/checkpoint]
pipelines:
logs:
receivers: [filelog/mule-logs-volume]
processors: [batch]
exporters: [splunk_hec/logs]