/ConfluentCyberDemo

Analyze Zeek IDS data with ksqlDB running on Confluent Platform via Docker on your laptop. Or spin up an arbitrary number of AWS hosts, each running Confluent Platform and ksqlDB for use in an instructor-led workshop.

Primary LanguagePython

Cyber Data with Confluent

The purpose of this project is to provide a demonstration of using Confluent to handle cyber data. This is based on (and forked from) Bert Hayes cp-zeek repo: (https://github.com/berthayes/cp-zeek). The demonstration will show the ingestion of zeek and syslog data, and then the processing of it with KSQL, and the Sigma stream processor (https://github.com/confluentinc/kafka-sigma)

Running on localhost

git clone https://github.com/wlaforest/ConfluentCyberDemo
cd ConfluentCyberDemo
docker-compose up -d

Wait about 5 minutes or so for everything to start up, then point your web browser to http://localhost:9021 for Confluent Control Center and http://localhost:8080 for the Sigma Rule UI.

Running on an external host

To run this environment on a system that is not your laptop/workstation, edit the docker-compose.yml file.

Look for this line:

CONTROL_CENTER_KSQL_KSQLDB1_ADVERTISED_URL: "http://localhost:8088"

And change it to something like this:

CONTROL_CENTER_KSQL_KSQLDB1_ADVERTISED_URL: "http://yourhost.yourdomain.com:8088"

Then start up docker as above with:

docker-compose up -d

Wait about 5 minutes or so for everything to start up, then point your web browser to http://yourhost.yourdomain.com:9021 for Confluent Control Center and http://yourhost.yourdomain.com::8080 for the Sigma Rule UI. Prior to walking through the demo pre-create the sigma-rules compacted topic by running bin/prepare-topics.sh

Demo Script

  • Explain the setup

    • Dockerized confluent environment with tcpreplay replaying a PCAP (packet capture)
    • zeek sensor (which has native Kafka support) sending network data to to Confluent.
    • This was built by Bert Hayes to demonstrate Confluent for handling network data.
    • Key, while Bert was making the capture he had a data exfiltration bot sending data via regular old DNS queries.
    • DNS is what lets you put in human recognized strings into your browser and find out what the network address is
  • Lets Examine the data streamin in.

    • Mention that we will be focusing Zeek DNS data.
    • Go to localhost:9021 (or remote host URL)
    • Click on the Cluster->Topics->DNS->Messages
    • Observer the messages spooling, and the click the pause button and switch to card view
    • Look a specific record by expanding and then scroll through the fields
    • Point out there are different kinds of DNS queries, but for the purposes of this demonstration we will focus on the query field
  • Lets examine and publish a sigma rule

    • Go to localhost:8080 for the Sigma Rule UI and click on the Sigma Rules tab
    • Select the "Upload Sigma Rule" button and select the zeek_dns_exfiltration.yaml file in the rules directory. (may need to drag a corner of the browser to nudge the editor if it is not fully displayed)
    • Suppose we are looking for indicators of data exfiltration using DNS lookups. This means that someone has a bot or trorjan inside our network and they want to send data out to a collecting server. One innocuous way to do this is make legitimate DNS queries from the trojan but encode the data in the DNS query. How can we detect likely exmaples of this?
    • Show sigma rule to find DNS queries with unusually long URLs. This pattern applies a regular expression and says that if the query field is longer than 180 thats weird. Most host names should be much shorter. If its longer than the host name could contain data.
    • Publish the rule to confluent by selecting the "Publish Sigma Rule" button
  • Lets evaluate the dns and detection data

    • Select the DNS tab
    • Lets see if the sigma processory is detecting anythign with that rule. The graph shows the raw dns query counts being received in gray. Any detections will be shown in red. The tables below show the raw data (show the DNS Topic Data and look at the query field).
    • You can also look in C3 Topics->DNS-detection->Messages.
    • There is nothing showing up. So if there is data exfiltration via DNS queries they aren't silly enough to use long URLs.
    • Optional: go back to the Sigma Rule tab and select the rule from the pull down menu. Change the query characters from 180 to 30 and publish the rule again. This will trip a good amount of detections which will be shown in the DNS chart and tables in the DNS tab. Change it back to 180 to reduce the number of detections.
  • Perhaps they are using a more modest query length but just making lots of queries. We are going to publish another rule that looks for this

    • Note that the sigma processor is still running...
    • Select the "Upload Sigma Rule" button and select the zeek_dns_exfiltration_time.yaml
    • Note tha the query length is much more modest but a little on the long side but thats not enough...we will get lots of noise. so we ony want to match if we see more than 10 of these in a 10s window
    • We will publish this new rule and the sigma processor will automatically pick it up.
  • Lets check out DNS-detection now

    • We are now seeing matches.
    • Lets examine the match. You can see the rules that triggered that match and the original source data is embedded inside of it.
    • Future option will all you to determine if you want the source event serialized or embedded
    • Switch to the console to see the data fully
    • You can see the pattern clearly with the match dns lookups