Anomalia Machina - Massively Scalable Anomaly Detection with Apache Kafka, Cassandra and Kubernetes

This is the final example code for the demonstration Anomaly Detection pipeline for Instaclustr's Anomalia Machina Blog series:

Instructions

For the design and more detailed instructions see the blogs (above). Here are the basic steps.

To run the Anomaly Detection pipeline you need to have the following configured and running (all on AWS):

Instaclustr Kafka and Cassandra clusters (for Cassandra, no authentication)
connect to the Cassandra cluster using cqlsh, and create the Cassandra keyspace and table (CQL in CassandraClient.java)
Kafka auto topic creation turned on (so you need to run the producer before the consumer, see below)
Kubernetes running in the same region as the Kafka and Cassandra clusters (E.g. On AWS use EKS)
Edit KafkaProperties.java with the Instaclustr Kafka cluster credentials
Edit AnomaliaProperties.jave with the Instaclustr Provisioning API credentials
Either: Configure Kafka and Cassandra cluster firewalls to enable access from Kubernetes (and use public IPs, this assumes you know the IPs of the Kubernetes worker nodes), or set up VPC peering between the Kubernetes cluster and the Instaclustr clusters (and use private IPs)
A local Docker and Kubernetes (On a Mac I was using the Docker community edition which comes with Kubernetes)
A Docker hub account (edit the xxx.sh files with the account name)
An IDE with the code loaded

To deploy and run the application:

Generate executable two jar files, one called consumer.jar from AnomaliaMainConsumer.jar, and one called producer.jar from AnomaliaMainProducer.jar
Start 1 or more Kubernetes worker nodes in AWS (using auto scaling groups)
Deploy Prometheus using the deploy_prometheus.sh script
Deploy the producer using the deploy_producer.sh script
Deploy the consumer using the deploy_consumer.sh script
Look at the prometheus metrics in a broswer (you'll need to copy a pubic IP address of one of the Kubernetes worker nodes from the AWS console into your browser), e.g. 1.2.3.4:30123
The producer load and consumers can be scaled by increasing the number of Kubernetes worker nodes and increasing the number of pods for producers and consumers. Some tuning of the parameters in AnomaliaProperties.java will be required to ensure optimal throughput.

Note that the Prometheus instrumentation is present and used in the final Kubernetes production environment. However, the OpenTracing/Jaeger tracing instrumentation is present but unused in the Kubernetes environment (you would have to run a Jaeger Operator to use it).

Instaclustr Open Source Project Status: SAMPLE

for further information see: https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/

narayudu/AnomaliaMachina

Anomalia Machina - Massively Scalable Anomaly Detection with Apache Kafka, Cassandra and Kubernetes

Instructions

Instaclustr Open Source Project Status: SAMPLE