/bootcamp-rails

Automation for a streaming bootcamp based on an external continuous source of train movements in the UK

Primary LanguageHCLApache License 2.0Apache-2.0

bootcamp-rails

Automation for a streaming bootcamp based on an external continuous source of train movements in the UK

This is a terraform script that will

  • Create an environment in the Confluent Cloud
  • Create a (currently Essentials) schema registry
  • Create a standard cluster
  • Add service users and API Keys
  • Add topics
  • Upload a connector plugin for downloading, decompressing and uploading a file with reference data regularly (usually once a day)
  • Upload additional reference data (referenced from the data directory)
    • Cancellation reasons
    • TOC codes (Train Operating Companies)
    • UK Rail locations
  • Configure two connectors
    • HttpCompressedSource for the locations and schedule data
    • ActiveMQSource for the train movement updates
  • Create a KSQL cluster

Git LFS

In order to clone this repository, you will need to have Git LFS (large file support) installed on your platform.

The easiest way to check if you have Git LFS available is to run

git lfs

If git knows this command, everything is set up correctly.

We need Git LFS because the JAR files included in this project exceed the GitHub limit of 100 MB. Without Git LFS, the files in the lib directory will be replaced with small text files that contain information about the JAR file, not the JAR files themselves.

Preparation

Add a terraform.tfvars file with target locations and credentials. I tend to add a file called .envrc (using direnv) that contains my actual credentials (such as API Key and Secret) so that I can use these from the command line as well.

You will need to define the following variables in your shell or the equivalent in your terraform.tfvars file before running :

export TF_VAR_confluent_api_key="XXXX"
export TF_VAR_confluent_api_secret="XXXX"
export TF_VAR_nrod_username="XXXX"
export TF_VAR_nrod_password="XXXX"

This requires getting an account at https://publicdatafeeds.networkrail.co.uk/ if you do not have one yet.

Hints:

Regular expression to convert CANX code:

Find: ([A-Z0-9][A-Z0-9]):{"canx_reason":"(.)","canx_abbrev":"(.)"}

Replace: \1|\2|\3

Possible queries

select FORMAT_TIMESTAMP(FROM_UNIXTIME(actual_timestamp) , 'yyyy-MM-dd HH:mm:ss') TIMESTAMP, event_type, MVT_DESCRIPTION , PLATFORM, VARIATION_STATUS, TOC from train_movements;

More details to follow (TODO)

  • Finish uploading of KSQL statements
  • Add student user management (take from bootcamp-streams)
  • Labs