- Still under-development.
- Document may not be entirely correct
- Some "moving" parts are not fully described or not even listed yet.
Here we are going to simulate a system using Kafka
and Spark
to examine its capability to handle continous data distribution, and at the
same time testing how PouchDB
and CouchDB
can do auto-sync between frontend and backend
There are 11 components:
- Kafka Brokers
- Zookeeper
- CMAK as Kafka WebUI Manager
- CouchDB
- User Frontend Application
- Admin Frontend Application
- Kafka Consumer
- Backend API as Kafka Producer
- Spark Master
- Spark Worker
- Apache Zeppelin
- Node, Npm
- Make
- Pyenv, Pipenv
- Docker
Basicall, run Make setup to install all dependencies and creating a docker network
$ make setup
Given our CouchDB (must be exposed to localhost) with default Authentication config being used (refer to ./env/couchdb
)
Add CORS to CouchDB so AdminApp can sync with it
$ make add_cors_couch
Alternatively, when running as a docker service, CORS can be enabled using GUI from http://localhost:5984/_utils/#_config/nonode@nohost. Note that CouchDB Cluster Config might need some manual adjustment
Using CMAK
as Kafka-Manager to add Cluster then setup proper partition-assignment going to http://localhost:9000
- Run the whole system Use the following command to fire the system up (and optionally scale the services - eg: 2 (k)afka-brokers, 3 (s)park-worker)
NOTE: Each spark-executor will require 1 single-core from the CPU, so depend on your machine spec, you can have more or less of them.
$ make up-scale k=2 s=3
- Submit the Spark application
If zookeeper and the brokers are ready to work, deploy the Spark app using.
$ make submit_job
As the job has been succesfully deployed, go to http://localhost:8080
and find the newly running application.
If you want to make any change to the job, just modify the scala codes in ./spark_job
and re-run the above command.
- Start the frontend applications Start frontend apps (UserApp and AdminApp) in 2 separated terminals, using 2 commands
$ make fe_user
$ make fe_admin
- Sending data to Kafka
Use the UserApp at
http://localhost:3001
to send data continously (stream of numbers) to Backend-Producer.
Alternatively you can go tob http://localhost:8000/docs and use Swagger to make api request.
-
Watching changes from frontend AdminApp If the admin app is already running, go to
http://localhost:3002
and see changes if there are any messages being streamed to CouchDB -
Enjoy hacking on your own :)
Considering what to add to complete the Architechture
- Add CouchDB
- Add API Client App
- Add Admin Client App with PouchDB for db real-time tracking
- Developing Producer Backend API
- Add Spark to consumer, connect to Kafka for streaming
- Scale Consumer
- Scale Kafka Broker and Spark Worker
- Use gRPC for backend-producer
- Store calculated data from Spark to CouchDB
- Apache Beam?
- Apache Avro
- KSQL?
- Deploy everything with Kubernetes
- Stress-testing