and
- Winning Solution of BWI Data Analytics Hackathon 2020
- CloudDevOps Pipeline with Green-Blue-Deployment for Davids Udacity CloudDevOps Nanodegree Capstone Project
- (as App is running on privately-owned real Internet-connected Infrastructure IPs are blurred)
Monitoring Dashboard | Model Performance | Anomaly Training | Application of Models |
---|---|---|---|
- unfortunately only in german :/
- Live-updating Webapp with DataPipeline from live running Zeek-Logs
- extensive and easily extentable Monitoring Dashboard
- Application of Neural Net and Random Forest models trained on pretrained labelled data against live Zeek logs
- Training of Anomaly Detection using IsolationForest can be triggered during Runtime
-
analysis contains all stuff Michael did for
- exploring the used labelled data from UNSW-NB15 Datasets
- checking out the performance of different models (mainly Random Forest and Neural Nets)
- train and optimize the best model approaches using keras-tuner
-
app contains all stuff David did for
- creating the live-updating Datapipeline using zeek logs
- parsing them with an tinkered version of ParseZeekLogs for enabling continuously feeding the logs into the pipeline
- and pygtail for also continuously feeding the logs into the pipeline
- creating Webapp using plotly and Dash
- Implementing live trained Anomaly Detection using Isolation Forest from scikit-learn
- creating the live-updating Datapipeline using zeek logs
CircleCI Branch CI/CD Pipeline | CircleCI Main CI/CD Pipeline |
---|---|
-
Clone the repository:
git clone https://github.com/herrfeder/AI_Cybersecurity_IDS_PoC.git
-
Go into Deploy Folder and
run_compose.sh
to runfile
-based orkafka
-based Stack:deploy/run_compose.sh kafka # OR deploy/run_compose.sh file
- first run will take very long because Docker Containers will be build locally and the zeek compilation and Kafka Plugin Install will take a while
- Go to http://127.0.0.1:8050/
- You need to build the previous Compose-based stack at least once and upload the resulting Docker Container using the
upload-docker.sh
script or you relying on my public-built Container:
- zeek_kafka https://hub.docker.com/repository/docker/herrfeder/zeek_kafka (already in k8s Configs)
- broai https://hub.docker.com/repository/docker/herrfeder/broai (already in k8s Configs)
-
You have to prepare and start minikube and run
run_kube_local.sh
:cd deploy ./run_kube_local.sh file # OR (you can run booth as well) ./run_kube_local.sh file
-
Now add local Ingress Rule to reach the broai endpoint:
kubectl apply -f broai_kubernetes/ingress-local-service.yaml # Check now these ingress service with kubectl get svc
-
Now add
green.broai
andblue.broai
with your minikube IP to your/etc/hosts
and visit this domains.
- You need to build the previous Compose-based stack at least once and upload the resulting Docker Container using the
upload-docker.sh
script or you relying on my public-built Container:
- zeek_kafka https://hub.docker.com/repository/docker/herrfeder/zeek_kafka (already in k8s Configs)
- broai https://hub.docker.com/repository/docker/herrfeder/broai (already in k8s Configs)
-
Install
aws-cli
and deploy the Network and Cluster Requirements with the provided AWS Cloudformation Scripts:cd .circleci scripts/push_cloudformation_stack.sh broainetwork cloudformation/network.yaml <your individual id> scripts/push_cloudformation_stack.sh broaicluster cloudformation/cluster.yaml <your individual id>
-
Get Access Token to acess your AWS EKS Cluster with kubectl:
cd deploy mkdir .kube aws eks --region us-west-2 update-kubeconfig --kubeconfig .kube/config-aws --name AWSK8SCluster
-
Deploy Kubernetes Manifests:
./run_kube_aws.sh
-
Go to http://127.0.0.1:8050/
-
Wait for finishing and check with
kubectl --kubeconfig .kube/config-aws get svc
the resulting Loadbalancer Hostnames and access them. :)
- replacing filebased Datapipeline by Apache Kafka feed (DONE in scope of Davids Udacity CloudDevOps Nanodegree Capstone Project)
- faster feeding into webapp
- more elegant data management
- also enabling Random Forest and Neural Net training during runtime
- feeding predicted live-data into analysis workflow for automatic re-evaluation and re-training