[all scopes of databricks project CICD]
-
3p batch processer
- ActiveBatch
- runs the deploy or other env scrips
- ssh github repo to AB
- Delta SymlinkManafest
- aws Role permissions -> table owner
- ActiveBatch
-
spark w/ databricks dbt
- run local context
- run ui all purpose
- run job w/ deploy local
- run job w/ github actions pipenv [script] "deploy.py"
- common utils IO wrapper
-
- logger
- set context
- dbutils widgets
- runtime
- self python functions _get(spark)
-
mlops
- infastructure as code (iac)
-
- terraform
-
- jinja2
- json.j2 or yaml deploy file
- use native jinja env variable w/ venv setup
- use jinja variable w/ dbutils and widgets
- use jinja template to auto populate
- jinja2
- build the deploy api componets dynamically
- github actions
- github runners
- permission chain aws -> appId
-
pytest
- Unit
- Ingest
- Checks
-
databricks-api
-
tableau-api-lib
- control the analytic end point
-
resources
- ec2 selection
- cluster monitoring
- setting spark conf
-
bi analytic golden rules
- users will want a data dictionary
- users will want visual consistency
- users will want simple not loud
- users will want flexibility of choice
- users will want to export data by their choosing
- users will want the data to match their CMS -- use sys report as viz source
-
tbd