tbana is a suite of connectors enabling several "big data" frameworks to acccess Splunk collected data.
With tbana, users of Hadoop, Cascading, Spark, and Storm can utilize data already collected in Splunk such as system log files, and other sources of "machine data," rather than setting up completely new and parallel data integest systems.
tbana is designed for the non-Splunk system to perform the analytics, in contrast to other connectors that enable Splunk to analyze data on non-Splunk systems. In Hadoop, for instance, this is done by providing an InputFormat that Hadoop MapReduce jobs can use. In effect, this opens up the data for any higher-layer system that can utilize a Hadoop InputFormat such as Hive, Pig, etc.
Currently, tbana supports the following:
- Apache Hadoop
- Cascading
- Apache Spark
- Twitter Storm
On the todo list are the following:
- Scalding
- Cascalog
- Spark Streaming
- Hive
- Pig
Some of these "should work" but are untested, or may need a thin wrapper.
Data from Splunk is retrieved in the following ways:
- Via a live search request using Splunk's REST API
- Via data from Splunk persisted in HDFS via Shuttl
Search requests to Splunk can be performed in the following ways:
- synchronous (via "export" mode)
- asynchronous (via "job" mode)
When pulling live data from Splunk to MapReduce jobs, the InputFormat class has an option to split the data by indexer. Normally, a client to Splunk will access the cluster via the Search Head. The Search Head is responsible for distributing searches to all the Indexers, and doing aggregating/reduce functions on the data before returning to the client. In this case, the search head is bypassed in order to avoid a potential bottleneck in the search head. Keep in mind that searches directly against the indexers may differ than from the search head, depending on how you've configured Splunk. However, if you know that searches run directly against the individual Indexers are sufficient for your needs, you can use the Indexer Split scheme to do parallel transfers.
Visit the wiki for build instructions and developer manual