Application that generates synthetic data for BigData concept proofs. Provides relational (postgres), nonrelational (mongodb) and streaming data with Kafka. The infra is built based on containers.
- data-generator : Contains the script responsible for generating synthetic data
- data-publisher : Contains the script responsible for generating the data stream
- docker-compose.yml : Orchestration of the application
In the data-generator and data-publisher containers, you need to identify the local IP of your host machine and the Kafka port. Look this: https://github.com/wurstmeister/kafka-docker/blob/master/README.md#advertised-hostname
- KAFKA_HOST: <kafka_advertised_host_name>:<kafka_advertised_port>
In the Kafka configuration, it is necessary to inform the local ip of your host machine.
- KAFKA_ADVERTISED_HOST_NAME: <your-local-ip>
General variables: Must be defined in an ".env" file at the project root
KAFKA_HOST=<kafka_advertised_host_name>:<kafka_advertised_port>
KAFKA_ADVERTISED_HOST_NAME=<your-local-ip>
KAFKA_ADVERTISED_PORT=9092
KAFKA_AUTO_CREATE_TOPICS_ENABLE=true
KAFKA_CREATE_TOPICS=messages:1:1
KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
MONGO_INITDB_ROOT_USERNAME=foo
MONGO_INITDB_ROOT_PASSWORD=foo
POSTGRES_PASSWORD=foo
POSTGRES_USER=foo
QUANTIY_RECORDS=1250000
QUANTIY_RECORDS_NOISE_PERCENT=30
PROVIDER = <PROVIDER_NAME>
KAFKA_RETENTION_MS=86400000
POSTGRES_DB_NAME=foo
MONGO_DB_NAME=foo
RUN_INSERTS_POSTGRES=True
RUN_INSERTS_MONGO=True
RUN_PUBLISHER=True
cd synthetic-data-generator
docker-compose up -d --build
cd synthetic-data-generator
docker-compose up -d --build
- 5432 => Postgres
- 5433 => Postgres admin
- 27017 => MongoDB
- 9092 => Kafka