changing configuration-syntax to v2, to allow loading and validation with Python `dataclasses`
spex66 opened this issue · 4 comments
hi fellow bootstrap-cli users and testers :)
I have an urgent task in front of me, which I would like to do before too many of you start with the actual configuration-syntax.
I've created the actual syntax in March'21 which is very compact, but cannot be read with Python dataclasses
.
Like done it for inso-extpipes-cli I need to create a v2-syntax which can be parsed completely by Python dataclasses
.
where you can help me with is:
- all values requires in this approach a named property
- finding good property-names is a challenge and we changing them later is a breaking-change
Feedback is appreciated bootstrap.namespaces
block and check the (left-hand-side) property-names, and let me know you feedback, that would help me very much!
UPDATE 220422 pa: after a review with @sdorheim we go with this syntax and naming for the upcoming v2 change
- provided as
configs/config-simple-v2-draft.yml
for testing and comparison toconfigs/config-simple-v1yml
UPDATE 220426 pa: after more testing and fixes the inline comments in the example-config were extended.
bootstrap:
# 220426 pa:
# - extended comments explaining options
# 220422 pa:
# - new v2-syntax making *all* bootstrap configuration parameters
# available through explicit named properties.
#
# Motivation for the breaking change:
# Compared to v1, which used an extremly sparse syntax, using values as yaml key and values.
# The v2 change is required to support loading and validating the configuration through
# Python 'dataclasses', which is a best practice and used by most Cognite yaml-based
# configurations.
# bootstrap supports now three sections
# 1. 'features'
# - making cli parameters and customizable naming-elements available
# 2. 'idp-cdf-mappings'
# - providing multi CDF Project support of IdP to CDF Group mappings
# 3. 'namespaces'
# - migration of v1 configuration into a hierarchy of
# - namespaces > ns-nodes
features:
# v2 adding as features, available as cli paramaters only atm (v1)
# allowed values are parsed case-insensitive: [true|yes|YES|..] or [false|no|NO|..]
# not as strings in quotes "yes" or 'yes'
with-special-groups: false
with-raw-capability: yes
# new in v2 to configure prior (v1) hard-coded naming elements
# all following feature values are set to match their v1 default values,
# to support a v1 migration.
#
# atm (v1) 'allprojects' is a hard-coded value
aggregated-level-name: allprojects
# atm (v1) 'cdf' is a hard-coded value
# supports empty-string ''
group-prefix: cdf
# atm (v1) 'dataset' is a hard-coded value
# supports empty-string ''
dataset-suffix: dataset
# atm (v1) 'rawdb' is a hard-coded value
rawdb-suffix: rawdb
# atm (v1) only ['state'] is a hard-coded value
# supports empty list [] for no additional variants
rawdb-additional-variants:
# provide more than one rawdb per ns-nodes
# atm (v1) hardcoded is one additional rawdb
- state
# - pump # one more additional variant
idp-cdf-mappings:
# Prior (v1) named 'aad_mappings'.
# Values for 'cdf-group' requires knowledge of resulting CDF Group names
#
# Now supporting multiple CDF Projects, like dev/test/prod
# in one config. Optimization, to reduce redundant maintenance.
# BOOTSTRAP_CDF_PROJECT env-variable is available and is used to select.
- cdf-project: shiny-dev
mappings:
- cdf-group: cdf:root
idp-source-id: 374dc9f6-f3a1-4b34-b897-11111111111
idp-source-name: CDF_DEV_ROOT
- cdf-group: cdf:allprojects:owner
idp-source-id: acd2fe35-aa51-45a7-acef-11111111111
idp-source-name: CDF_DEV_ALLPROJECTS_OWNER
- cdf-group: cdf:allprojects:read
idp-source-id: acd2fe35-aa51-45a7-acef-11111111111
idp-source-name: CDF_DEV_ALLPROJECTS_READ
- cdf-group: cdf:uc:001:demand:read
idp-source-id: 314159-aa51-45a7-acef-11111111111
idp-source-name: CDF_DEV_UC001DEMAND_READ
- cdf-project: shiny-prod
mappings:
- cdf-group: cdf:root
idp-source-id: 374dc9f6-f3a1-4b34-b897-22222222222
idp-source-name: CDF_PROD_ROOT
- cdf-group: cdf:allprojects:owner
idp-source-id: acd2fe35-aa51-45a7-acef-22222222222
idp-source-name: CDF_PROD_ALLPROJECTS_OWNER
- cdf-group: cdf:allprojects:read
idp-source-id: acd2fe35-aa51-45a7-acef-11111111111
idp-source-name: CDF_PROD_ALLPROJECTS_READ
- cdf-group: cdf:uc:001:demand:read
idp-source-id: 314159-aa51-45a7-acef-22222222222
idp-source-name: CDF_PROD_UC001DEMAND_READ
namespaces:
- ns-name: src
description: Customer source-systems
ns-nodes:
- node-name: src:001:sap
description: Sources 001; from SAP
# provide 'external-id' explicit for full control (i.e. of length) for the dataset
# otherwise it will be autogenerated '{ns-name}:{features.dataset-suffix}'
# 220425 pa: atm of writing we have hard CDF limits of
# dataset.name 50 characters
# dataset.external_id 256 characters
external-id: src:001:sap
- node-name: src:002:weather
description: Sources 002; from Weather.com
# external-id will be auto generated in this case
- ns-name: in
description: End user data-input provided through deployed CDF driven solutions
ns-nodes:
- node-name: in:001:trade
description: Description about user inputs related to name
# external_id: in:001:trade
- ns-name: uc
description: Use Cases representing the data-products
ns-nodes:
- node-name: uc:001:demand
description: Use Case 001; Supply and Demand
metadata:
created: 220401
generated: by cdf-config-hub script
shared-access:
read:
- node-name: src:001:sap
- node-name: src:002:weather
owner:
- node-name: in:001:trade
For comparison this is the same configuration in actual v1-syntax (with aad_mapping
only covering one CDF Project!)
bootstrap:
src:
src:001:sap:
description: Description about sources related to name
external_id: src:001:sap
src:002:weather:
description: Description about sources related to name
external_id: src:002:weather
in:
in:001:trade:
description: Description about user inputs related to name
external_id: in:001:trade
uc:
uc:001:demand:
description: Description about use case
external_id: uc:001:demand
shared_read_access:
- src:001:sap
- src:002:weather
shared_owner_access:
- in:001:trade
metadata:
created: 220401
generated: by cdf-config-hub script
aad_mappings:
cdf:root:
- 374dc9f6-f3a1-4b34-b897-11111111111
- CDF_DEV_ROOT
cdf:allprojects:owner:
- acd2fe35-aa51-45a7-acef-11111111111
- CDF_DEV_ALLPROJECTS_OWNER
cdf:allprojects:read:
- acd2fe35-aa51-45a7-acef-0d54e2b6b6a8
- CDF_DEV_ALLPROJECTS_READ
cdf:uc:001:demand:read:
- 314159-aa51-45a7-acef-11111111111
- CDF_DEV_UC001DEMAND_READ
The PR #32 is available for extensive testing.
We need to make the switch ASAP when everyone using bootstrap-cli has confidence in migration from v1 to v2 (fyi: @gaetan-h , @BergsethCognite, @sdorheim)
Highlights for 'deploy'
- the
bootstrap.features
section allows more customization
Highlights 'diagram' got support for
- made 'cognite' section optional for 'diagram', when
--cdf-project
parameter is provided on cli
Test status:
- tested 'diagram' with v2 and new parameters
- tested 'deploy' in
dry-run
mode - not started to test 'prepare' and 'delete', but their configurations should not be affected by v2 changes
(after a discussion with @gaetan-h)
Adding some background for a (not obvious) change, which is only poorly covered yet in example and documentation:
-
with v1-syntax you had to provide explicit
external_id
in configurationdataset.name
are autogenerated in v1 with this patternf'{ns-name}:{features.dataset-suffix}'
dataset.external_id
were set as provided in yaml-config!
-
with v2-syntax we provide the ability of auto-generated
dataset,name
anddataset.external_id
following the exact same template- options to customize are
- provide a value for
features.dataset-suffix
like"ds"
instead of default"dataset"
- (continue to) provide explicit
external-id
values for full control
The autogenerated names could be an issue atm, as we have hard CDF length-limits:
dataset.name
: 50 charactersdataset.external_id
: 256 characters
(FYI: a Cognite internal ticket is opened to check how to relax all this bootstrap-related limits to 256 characters)
here is an v2-syntax example mixing both approaches with additional comments
ns-nodes:
- node-name: src:001:sap
description: Sources 001; from SAP
# provide 'external-id' explicit for full control (i.e. of length) for the dataset
# otherwise it will be autogenerated '{ns-name}:{features.dataset-suffix}'
# 220425 pa: atm of writing we have hard CDF limits of
# dataset.name 50 characters
# dataset.external_id 256 characters
external-id: src:001:sap
- node-name: src:002:weather
description: Sources 002; from Weather.com
# external-id will be auto generated in this case