Distributed System for large scale data management
automation/ : Bash scripts to automatize the deployment of the infrastructure and the application
config/ : Kubernetes and AWS configuration files
src/ : Source code of our application
- app/ : Code source of the web application and Dockerfile associated
- aws/ : Code source of the aws manager (perform actions on AWS)
- cassandra/ : Creator of the database and Dockerfile associated
- kafka/ : Code source of the kafka producer to generate tweets and Dockerfile associated to kafka
- utils/ : Config manager for project's variables
In order to automatically run the solution, please follow these steps :
- Configure your credentials AWS :
Follow this guide to have at least the ~/.aws/credentials
and ~/.aws/config
(at least with the region) files : https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
- Install the packages (python3)
pip3 install -r requirements.txt
(You could use virtualenv to isolate this project from your other projects : https://virtualenv.pypa.io/en/latest/)
- Configure your deployment (Optional)
You could configure the deployment using the config/instances.ini file
- Go to the automation folder
cd automation
- Launch the deployment script and wait during the creation time
./deploy.sh
In order to automatically run the solution, please follow these steps :
- Go back to the root folder
cd ..
- Get back the master's public ip address
./manage.py read type get-master-public-ip
- Connect by ssh to the master
ssh -i ssh/Smackey ubuntu@MASTER_PUBLIC_IP
- You could use
kubectl
commands for example to play with the cluster
kubectl get pods
Some pods at the startup will have the status 'Error' : the cluster just need some time to attain a global coherency
- Get back a worker's ip address
./manage.py read type get-workers-public-ip
- Open a browser to the address below
http://ONE_WORKER_PUBLIC_IP:32222
All contributions are well appreciated.
Please read CONTRIBUTING.md before starting to contribute on this project.