Scalable distributed online forecastingsystem for missing error measurements

The report is in the docs folder with name BDMA_PHASE_II.pdf

Folder structure

├── data
│   ├── data.conv.txt.gz
│   └── mote_locs.txt
├── docker
│   ├── hdfs
│   │   ├── docker-compose.yml
│   │   └── hadoop.env
│   ├── kafka
│   │   └── docker-compose.yml
│   ├── serve
│   │   ├── data-serving
│   │   │   ├── hbase-site.xml
│   │   │   ├── opentsdb.conf
│   │   │   └── rollup_config.json
│   │   ├── docker-compose.yml
│   │   ├── grafana-storage
│   │   └── opentsdb-docker
│   │       ├── docker-compose.yml
│   │       ├── Dockerfile
│   │       ├── files
│   │       │   ├── create_table.sh
│   │       │   ├── create_tsdb_tables.sh
│   │       │   ├── entrypoint.sh
│   │       │   ├── hbase-site.xml
│   │       │   ├── opentsdb.conf
│   │       │   ├── start_hbase.sh
│   │       │   └── start_opentsdb.sh
│   │       ├── LICENSE
│   │       └── README.md
│   └── spark
│       ├── docker-compose-aws.yml
│       ├── docker-compose-test.yml
│       ├── docker-compose.yml
│       └── stream-data-processing
│           ├── Dockerfile
│           ├── jupyter_notebook_config.py
│           ├── spark-env.sh
│           └── start-spark
├── docs
│   └── Assignment.pdf
│   └── REPORT_BDMA_PHASE_II.pdf
├── notebooks
│   ├── fairscheduler-statedump.log
│   ├── KafkaSparkStreamingPersistence
│   │   ├── KafkaReceivePersistence.ipynb
│   │   └── KafkaSendPersistence.ipynb
│   ├── KafkaSparkStreamingRLS
│   │   ├── KafkaReceiveRLS.ipynb
│   │   └── KafkaSendRLS.ipynb
│   └── Work.ipynb
└── README.md

Data visualisation

before running the scripts for data visualisation, extract the data into the data directory: mkdir scripts/data

Architecture

Apache Kafka

3 Zoookeper nodes
3 Kafka Brockers

Apache Spark

1 Master node
2 Workers
- Each with 4 cores and 4Gb of RAM

OpenTSDB

HDFS cluster with 1 name-node and 3 data-nodes
HBase

Grafana

1 Dashboard
note: the Grafana's storage will not be stored in the repository

How to deploy

Deploy Kafka cluster service:
```
cd docker/kafka
docker-compose up -d
```
Deploy HDFS cluster service:
```
cd docker/hdfs
docker-compose up -d
```
note: not necessary when deploying it locally for testing purposes.
Deploy Spark cluster:
```
cd docker/spark
docker-compose up -f docker-compose-aws.yml -d # if deploying it in a server
docker-compose up -d # if deploying for testing with HDFS
docker-compose -f docker-compose-test.yml up -d # ... without hdfs   
```
note: After run the Spark Cluster a Jupyter notebook is then available in http://[IP]:8888 whose password is "secret". The folder notebooks will synchronize its content with a notebooks folder in the master node, so any work you want to save, should be placed in the noteboks folder.
Deploy OpenTSDB + Grafana:
```
cd docker/serve
docker-compose up -d
```
note: You can access the openTSDB web ui in http://[IP]:4242 and the Grafana Dashboard in http://[IP]:3000. The user and password for the Dashboard are "admin", "admin".

denguir/Scalable-distributed-online-forecasting-system-for-missing-error-measurements

Scalable distributed online forecastingsystem for missing error measurements

Folder structure

Data visualisation

Architecture

Apache Kafka

Apache Spark

OpenTSDB

Grafana

How to deploy