- Java
- Ansible
- Docker
- Apache Zookeeper
- Apache Kafka
- Apache Storm
- AWS
- EC2
- RDS
- Elasticache(Redis)
- S3
- LookupBolt
- Extracts the CustID from the tuple
- Looks up the Redis cluster and gets the SSN for that CustID
- Passes the SSN along with the Balance to both RDSInserterBolt and S3WriterBolt
- RDSInserterBolt
- For each tuple does an upsert in RDS PSQL Balances table
- S3WriterBolt
- Accumulates the tuples received based on either specified count or specified time delta(whatever happens first)
- Generates a file based on above and writes into S3
- ansible-playbook infrastructure.yml
- Creates a dedicated VPC for this project
- Creates two subnets inside that VPC, sets up an Internet Gateway and defines all the necessary routes
- Creates a SecurityGroup to be assigned to different resources throughout this project
- Spins up EC2 instances and runs Zookeeper, Kafka and Storm Docker containers on them
- Creates an Elastichache(Redis) cluster and populates it with lookup data
- Creates an RDS Postgres instance to be populated by the pipeline
- Deploys the storm topology
- KafkaStormProducer
- Produces events to kafka topic to be consumed by the pipeline
- Make KafkaStormProducer param based
- Split CustID_Balance to 2 separate files
- Create a config.yml.tplt and start pushing this instead of original ansible config one
- Implement time/count based batching for S3WriterBolt
- Add storm logviewer to supervisor dockers
- Implement a prod level logging and exception handling
- If possible, always use Terraform for infrastructure creation. Ansible is good as far as working with already provisioned resources goes(i.e a kafka container is being spun up on an EC2 instance, however, terraform is much more intuitive as far as EC2 instance creation, itself, is concerned)
- Current configuration of this project will be using AWS services that are beyond the Free Tier!