Integration between Grafana postgres frontend cache and the Data science pipeline.
This is intended run as a docker container where one container listens to one topic.
Per the 12 Factors the configuration should be in the environment.
As a convenience for local execution, you may leverage the settings.yaml and .secrets.yaml for providing configuration. The values in these files will be used as defaults. Any production (or docker) deployment should overrride these values using environment variables.
- GDI_TOPIC - EventHub topic name
- GDI_KEY - the EventHub access key
- GDI_NAMESPACE - the fully qualified namespace for the EventHub namespace
- GDI_SHARED_ACCESS_POLICY - The EventHub shared access policy name
- GDI_DB_HOST - The fully qualified host address for the integration database
- GDI_DB_PORT - The port for connecting to the integration database
- GDI_DB_DATABASE - The name of the database within the database server
- GDI_DB_USER - The user name to use when connecting to the database
- GDI_DB_PASSWORD - The password to use when connecting to the database
- GDI_DB_SCHEMA - The database schema
- GDI_CHECKPOINT_STORE_CONNECTION - The connection string for azure blob storage in which to store checkpoints.
- GDI_CHECKPOINT_STORE_CONTAINER - The azure blob storage container name in which to store checkpoints.
- GDI_CONSUMER_GROUP - The EventHub consumer group. Defaults to '$default'.
- GDI_BUFFER_SIZE - The number of messages to receive before writing to the database. Defaults to 1.
- GDI_LOG_LEVEL - Can be NOTSET, DEBUG, INFO, WARNING, ERROR, CRITICAL. Defaults to ERROR.
- GDI_MAX_BUFFER_TIME_IN_SEC - Maximum number of seconds between buffer flushes regardless of how many messages are in the buffer. Defaults to 20.
- GDI_MAX_TIME_TO_KEEP_DATA_IN_SEC - Maximum age of data kept in the database in seconds. Defaults to 7 days.
- GDI_DATA_EVICT_INTERVAL_IN_SEC - Frequency, in seconds, to evaluate, and evict, aged out data. Defaults to 2 hours.
If you intend to run the project locally, you will need to have:
- Python 3.8
- Pipenv
- Make (usually part of existing dev tools)
- To install python, it's recommended that you use pyenv
- Once python 3.8 is installed, you need to install pipenv.
- If
make
is not installed please install your platform's development tools or gcc. - From the project directory run
pipenv install --dev
To run it you can use the Makefile. You will need to edit the environment variables for the run-local target.
make run-local
You could also simply run:
python main.py
Be sure your are in your python virtualenv so that your libraries are on the path.
You will need to have docker and docker compose installed locally. See https://docs.docker.com/get-docker/ for more information.
To package code changes into the docker image, you must build it. You can do this by running:
make build
To use the image for installation in the cloud, you will need to push the docker image to an accessible repository. Currently, the Makefile is configured for hub.docker.com.
To use it, you will need to be logged into docker hub as a user with write access
to the Chesapeake organization. If you are not logged in, try:
docker login
Once you are logged in via docker's cli tools you can run:
make push
The terraform files located in the terraform folder are used to deploy this integration into the datasci cloud.
To use the terraform, you will need to build and push any changes into the docker image. From their, you'll need to reference this terraform module from your main terraform script. From example:
module "grafana-integration" {
source = "github.com/chesapeaketechnology/grafana-dataintegration/terraform"
resource_group_name = var.resource_group_name
system_name = var.cluster_name
virtual_network_name = var.virtual_network_name
location = var.location
environment = var.environment
default_tags = var.default_tags
network_profile_id = var.network_profile_id
db_host = module.datasci-data.server_fqdn
db_name = "grafana"
db_password = module.datasci-data.administrator_password
db_user = "${module.datasci-data.administrator_login}@${module.datasci-data.server_name}"
eventhub_namespace = var.eventhub_namespace
eventhub_keys = var.eventhub_keys
eventhub_shared_access_policies = var.eventhub_shared_access_policies
topics = var.topics
consul_server = var.consul_server
}
0.3.0 - 2021-05-12
- Increased the version number character limit from 10 to 15.
- Added a SQL trigger to capture device_ids to another table.
0.2.8 - 2020-11-02
- Resolved issue with the data eviction being run on every incoming message.
- Changed the source_id field for UMTS so that it does not include the LAC.
0.2.7 - 2020-10-23
- Added the source_id field
0.2.4 - 2020-08-20
- Resolved issue with db connection contention
0.2.3 - 2020-08-20
- Added support for data eviction
0.2.2 - 2020-08-19
- Resolved issue with identifying precision of unix time
0.2.1 - 2020-08-18
- Added support for max buffer time
0.2.0 - 2020-08-12
- Migrated to using a generic storage schema based on Postgres JSONB support
- Removed database migrations as they are no longer needed with the generic schema
- Removed message versioning as it is not longer needed with the generic schema
- Moved to using dynaconf for a layered configuration.
0.1.0 - 2020-08-12
- Initial build of integration code based on message type
- Added support for database migrations
- Added support for LteRecord Message type
- Added support for Message versioning
- Les Stroud - lstroud