foospidy/HoneyPy

Splunk logger

deadbits opened this issue · 6 comments

Adding an option for sending HoneyPy logs to a Splunk instance would be fantastic :)

Splunk handles json well by default so I imagine a modification of the file logger would do the trick.

Here's an example of sending json to Splunk using their HTTP Event Collector (basically an endpoint that accepts input data) https://www.garysieling.com/blog/send-json-data-splunk-cloud-python. Only additional data that would need to be added is the Splunk host, index, source, and sourcetype - which could be specified in the config file

Is what you are requesting specific to Splunk Cloud?

Is it safe to assume the approach in this logger is still valid for non-Splunk Cloud https://github.com/foospidy/HoneyPy/blob/master/loggers/splunk/honeypy_splunk.py ?

So, yeah you're logger should work still for Cloud.

Few questions and suggestions on the logger tho :)

  1. It's a bit safer to use an API token instead of passing the username and password as auth.

  2. I don't see the logger assigning any Splunk index, source, or sourcetype, unless I'm missing something. Would it make sense to add {"index": "honeypots", "source": "HoneyPy", "sourcetype": "hpevent"} to your POST? Or let the user add the index and source info to the config and read it from there?

  3. Are these keys and values in the code block below part of the honeypot interaction (I.e., remote_host is the attacker/source IP, local_host = target IP?
    If so, it'd be a lot easier on the Splunk side to search the data, correlate against other datasets, etc., if the field names were mapped using Splunks CIM model. Basically just changing remote_ip to src_ip, local_ip to dst_ip, same with ports, etc. Not a ton would need to be modified to meet CIM.

data = {
	...
        'protocol': protocol,
        'event': event,
        'local_host': local_host,
        'local_port': local_port,
        'service': service,
        'remote_host': remote_host,
        'remote_port': remote_port,
        'data': data,
	...
}

Combining the index/source thing and CIM, I believe you could just do something like this and it'd be good to go:

data = {
	'index': config.get('splunk', 'index'),
	'source': config.get('splunk', 'source'),
	'sourcetype': config.get('splunk', 'sourcetype'),
	....
        'event': event,
        'dest_ip': local_host,
        'dest_port': local_port,
        'service': service,
        'src_ip': remote_host,
        'src_port': remote_port,
        'data': data,
	...
}

Since I wrote the logger, I'll try my best to answer! I want to add that when I wrote this, I had little experience with Splunk. I wrote this while running the trial version of Splunk, and quickly forgot about it (sorry).

  1. I agree. I think that allowing both (but recommending API-token) would be the best.

  2. Absolutely. I am not sure what the most recommended way to do this is, or what people do "in the real world" when working with Splunk, but defining some sane defaults and allowing for customization would be smart.

  3. The key-value names are copied directly from the ElasticSearch-logger, and no actually thought was put into this from my side! I had never heard of CIM, but I totally agree that its a good idea to use that instead.

Awesome job on the PR! You've made my dreams come through, and even used an HEC :)

Thanks for your work on this!