/SplunkTroubleshooting

A repository of important troubleshooting searches for Splunk

SplunkTroubleshooting

A repository of important troubleshooting searches for Splunk

Splunk Problem Classification

Image 1

What Splunk Logs About Itself

Image 2

Useful Pipeline Searches with Metrics.log

How much time is Splunk spending within each pipeline?
index=_internal source=*metrics.log* group=pipeline | timechart sum(cpu_seconds) by name

How much time is Splunk spending within each processor?
index=_internal source=*metrics.log* group=pipeline | timechart sum(cpu_seconds) by processor

What is the 95th percentile of measured queue size?
index=_internal source=*metrics.log* group=queue | timechart perc95(current_size) by name

What is the maximum number of entries used in each queue? (1000 is max queue size, except for forwarding)
index=_internal source=*metrics.log* group=queue | timechart max(current_size) by name

Troubleshooting Forwarding

Which forwarders are sending data to Splunk and how much?
index=_internal sourcetype=splunkd host=<indexer> metrics tcpin_connections | timechart span=5m max(tcp_KBps) by sourceIp

Where is the forwarder trying to send data to?
index=_internal host=< uf > sourcetype=splunkd StatusMgr destHost

What output queues are set up?
index=_internal host=<uf > source=* metrics.log group=queue tcpout | stats count by name

Distributed Search Job Overview

Image 3

Search Problem Categories

Image 4

Troubleshooting User Searches

Lengthy Search?
index="_audit" action="search" (id=* OR search_id=*) | eval user=if(user=="n/a",null(),user) | stats max(total_run_time) as total_run_time first(user) as user by search_id | stats count perc95(total_run_time) median(total_run_time) by user

How much time are the indexers spending in response to queries from SH?
index=_internal source=* remote_searches.log server=<sh> | stats max(elapsedTime) by search_id host

Identify all splunkd responses taking more than 100ms
index=_internal sourcetype=splunkd_access host=<sh> user=<user> | rex "(?<spent>\d+)ms" | search spent > 100

What is the size of the artifacts?
index=_internal sourcetype=splunkd_access method=GET jobs | stats sum(bytes) by uri

Troubleshooting No Results Problems


The following checks can help troubleshoot no results Image 5