The Spark code identifies potential IPs from which a DDOS attack orginates within a minute of the attack. The code explores the legacy spark method of DStreams and the new method of Structured streaming
155.156.168.116 - - [25/May/2015:23:11:15 +0000] "GET / HTTP/1.0" 200 3557 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; acc=baadshah; acc=none; freenet DSL 1.1; (none))" For more information, please read the apache log format
Spark Streaming (Python API)
Apache Flume (Python API)
Python 2.7 or above
HDP 2.5
The goal is get better understanding of Spark streaming's windowing functions
The steps to test this code are as follows:
- Start Flume in HDP with following command, make sure you change the directories based on your env.
bin/flume-ng agent --conf conf --conf-file /home/maria_dev/flume/sparkstreamingflume.conf --name a1
- Execute the Spark job as shown below
spark-submit --packages org.apache.spark:spark-streaming-flume_2.11:2.0.0 SparkFlumeIpPh.py
- Start feeding log data to Flume Spool directory based on your env
Implement more Machine Learning to detect DDOS attackers.
In this experiment we stream the apache log messages to a Kafka topic setup on anEC2 servers and then have a Spark Structured streaming job in Databricks listening to the Kafka topic. The idea is to explore how to use databricks with data streaming to a Kafka on AWS EC2 server. The IPython notebook in the folder structured_streaming_ddos is hosted on Databricks and listens to the Kafka server in AWS