cognitedata/inso-bootstrap-cli

changing configuration-syntax to v2, to allow loading and validation with Python `dataclasses`

spex66 opened this issue · 4 comments

hi fellow bootstrap-cli users and testers :)

I have an urgent task in front of me, which I would like to do before too many of you start with the actual configuration-syntax.

I've created the actual syntax in March'21 which is very compact, but cannot be read with Python dataclasses.

Like done it for inso-extpipes-cli I need to create a v2-syntax which can be parsed completely by Python dataclasses.

where you can help me with is:

  • all values requires in this approach a named property
  • finding good property-names is a challenge and we changing them later is a breaking-change

Feedback is appreciated bootstrap.namespaces block and check the (left-hand-side) property-names, and let me know you feedback, that would help me very much!

UPDATE 220422 pa: after a review with @sdorheim we go with this syntax and naming for the upcoming v2 change

  • provided as configs/config-simple-v2-draft.yml for testing and comparison to configs/config-simple-v1yml
    UPDATE 220426 pa: after more testing and fixes the inline comments in the example-config were extended.
bootstrap:
  # 220426 pa:
  #  - extended comments explaining options
  # 220422 pa:
  #   - new v2-syntax making *all* bootstrap configuration parameters
  #     available through explicit named properties.
  #
  # Motivation for the breaking change:
  # Compared to v1, which used an extremly sparse syntax, using values as yaml key and values.
  # The v2 change is required to support loading and validating the configuration through
  # Python 'dataclasses', which is a best practice and used by most Cognite yaml-based
  # configurations.

  # bootstrap supports now three sections
  # 1. 'features'
  #   - making cli parameters and customizable naming-elements available
  # 2. 'idp-cdf-mappings'
  #   - providing multi CDF Project support of IdP to CDF Group mappings
  # 3. 'namespaces'
  #   - migration of v1 configuration into a hierarchy of
  #     - namespaces > ns-nodes

  features:
    # v2 adding as features, available as cli paramaters only atm (v1)
    # allowed values are parsed case-insensitive: [true|yes|YES|..] or [false|no|NO|..]
    # not as strings in quotes "yes" or 'yes'
    with-special-groups: false
    with-raw-capability: yes

    # new in v2 to configure prior (v1) hard-coded naming elements
    # all following feature values are set to match their v1 default values,
    # to support a v1 migration.
    #
    # atm (v1) 'allprojects' is a hard-coded value
    aggregated-level-name: allprojects
    # atm (v1) 'cdf' is a hard-coded value
    #   supports empty-string ''
    group-prefix: cdf
    # atm (v1) 'dataset' is a hard-coded value
    #   supports empty-string ''
    dataset-suffix: dataset
    # atm (v1) 'rawdb' is a hard-coded value
    rawdb-suffix: rawdb
    # atm (v1) only ['state'] is a hard-coded value
    #   supports empty list [] for no additional variants
    rawdb-additional-variants:
      # provide more than one rawdb per ns-nodes
      # atm (v1) hardcoded is one additional rawdb
      - state
      # - pump # one more additional variant

  idp-cdf-mappings:
    # Prior (v1) named 'aad_mappings'.
    # Values for 'cdf-group' requires knowledge of resulting CDF Group names
    #
    # Now supporting multiple CDF Projects, like dev/test/prod
    # in one config. Optimization, to reduce redundant maintenance.
    # BOOTSTRAP_CDF_PROJECT env-variable is available and is used to select.
    - cdf-project: shiny-dev
      mappings:
      - cdf-group: cdf:root
        idp-source-id: 374dc9f6-f3a1-4b34-b897-11111111111
        idp-source-name: CDF_DEV_ROOT
      - cdf-group: cdf:allprojects:owner
        idp-source-id: acd2fe35-aa51-45a7-acef-11111111111
        idp-source-name: CDF_DEV_ALLPROJECTS_OWNER
      - cdf-group: cdf:allprojects:read
        idp-source-id: acd2fe35-aa51-45a7-acef-11111111111
        idp-source-name: CDF_DEV_ALLPROJECTS_READ
      - cdf-group: cdf:uc:001:demand:read
        idp-source-id: 314159-aa51-45a7-acef-11111111111
        idp-source-name: CDF_DEV_UC001DEMAND_READ
    - cdf-project: shiny-prod
      mappings:
      - cdf-group: cdf:root
        idp-source-id: 374dc9f6-f3a1-4b34-b897-22222222222
        idp-source-name: CDF_PROD_ROOT
      - cdf-group: cdf:allprojects:owner
        idp-source-id: acd2fe35-aa51-45a7-acef-22222222222
        idp-source-name: CDF_PROD_ALLPROJECTS_OWNER
      - cdf-group: cdf:allprojects:read
        idp-source-id: acd2fe35-aa51-45a7-acef-11111111111
        idp-source-name: CDF_PROD_ALLPROJECTS_READ
      - cdf-group: cdf:uc:001:demand:read
        idp-source-id: 314159-aa51-45a7-acef-22222222222
        idp-source-name: CDF_PROD_UC001DEMAND_READ

  namespaces:
    - ns-name: src
      description: Customer source-systems
      ns-nodes:
        - node-name: src:001:sap
          description: Sources 001; from SAP
          # provide 'external-id' explicit for full control (i.e. of length) for the dataset
          # otherwise it will be autogenerated '{ns-name}:{features.dataset-suffix}'
          # 220425 pa: atm of writing we have hard CDF limits of
          #    dataset.name        50 characters
          #    dataset.external_id 256 characters
          external-id: src:001:sap
        - node-name: src:002:weather
          description: Sources 002; from Weather.com
          # external-id will be auto generated in this case
    - ns-name: in
      description: End user data-input provided through deployed CDF driven solutions
      ns-nodes:
        - node-name: in:001:trade
          description: Description about user inputs related to name
          # external_id: in:001:trade
    - ns-name: uc
      description: Use Cases representing the data-products
      ns-nodes:
        - node-name: uc:001:demand
          description: Use Case 001; Supply and Demand
          metadata:
            created: 220401
            generated: by cdf-config-hub script
          shared-access:
            read:
              - node-name: src:001:sap
              - node-name: src:002:weather
            owner:
              - node-name: in:001:trade

For comparison this is the same configuration in actual v1-syntax (with aad_mapping only covering one CDF Project!)

bootstrap:
  src:
    src:001:sap:
      description: Description about sources related to name
      external_id: src:001:sap
    src:002:weather:
      description: Description about sources related to name
      external_id: src:002:weather

  in:
    in:001:trade:
      description: Description about user inputs related to name
      external_id: in:001:trade

  uc:
    uc:001:demand:
      description: Description about use case
      external_id: uc:001:demand
      shared_read_access:
        - src:001:sap
        - src:002:weather
      shared_owner_access:
        - in:001:trade
      metadata:
        created: 220401
        generated: by cdf-config-hub script

aad_mappings:
  cdf:root:
    - 374dc9f6-f3a1-4b34-b897-11111111111
    - CDF_DEV_ROOT
  cdf:allprojects:owner:
    - acd2fe35-aa51-45a7-acef-11111111111
    - CDF_DEV_ALLPROJECTS_OWNER
  cdf:allprojects:read:
    - acd2fe35-aa51-45a7-acef-0d54e2b6b6a8
    - CDF_DEV_ALLPROJECTS_READ
  cdf:uc:001:demand:read:
    - 314159-aa51-45a7-acef-11111111111
    - CDF_DEV_UC001DEMAND_READ

The PR #32 is available for extensive testing.

We need to make the switch ASAP when everyone using bootstrap-cli has confidence in migration from v1 to v2 (fyi: @gaetan-h , @BergsethCognite, @sdorheim)

Highlights for 'deploy'

  • the bootstrap.features section allows more customization

Highlights 'diagram' got support for

  • made 'cognite' section optional for 'diagram', when --cdf-project parameter is provided on cli

Test status:

  • tested 'diagram' with v2 and new parameters
  • tested 'deploy' in dry-run mode
  • not started to test 'prepare' and 'delete', but their configurations should not be affected by v2 changes

(after a discussion with @gaetan-h)

Adding some background for a (not obvious) change, which is only poorly covered yet in example and documentation:

  • with v1-syntax you had to provide explicit external_id in configuration

    • dataset.name are autogenerated in v1 with this pattern f'{ns-name}:{features.dataset-suffix}'
    • dataset.external_id were set as provided in yaml-config!
  • with v2-syntax we provide the ability of auto-generated dataset,name and dataset.external_id following the exact same template

    • options to customize are
    • provide a value for features.dataset-suffix like "ds" instead of default "dataset"
    • (continue to) provide explicit external-id values for full control

The autogenerated names could be an issue atm, as we have hard CDF length-limits:

  • dataset.name: 50 characters
  • dataset.external_id: 256 characters
    (FYI: a Cognite internal ticket is opened to check how to relax all this bootstrap-related limits to 256 characters)

here is an v2-syntax example mixing both approaches with additional comments

      ns-nodes:
        - node-name: src:001:sap
          description: Sources 001; from SAP
          # provide 'external-id' explicit for full control (i.e. of length) for the dataset
          # otherwise it will be autogenerated '{ns-name}:{features.dataset-suffix}'
          # 220425 pa: atm of writing we have hard CDF limits of
          #    dataset.name        50 characters
          #    dataset.external_id 256 characters
          external-id: src:001:sap
        - node-name: src:002:weather
          description: Sources 002; from Weather.com
          # external-id will be auto generated in this case

merged PR #32
closing