Improve specification validation errors

Question

Improve specification validation errors

philippjfr opened this issue 2 years ago · 1 comments

When a dashboard or component specification is malformed the errors are not always clear. This issue should track any unclear error messages to make sure we improve them. The goal will be to provide a lumen validate command that can validate a YAML spec to the best of its ability without pulling any data from the Source.

Goals

Add a lumen validate command that can provide useful validation without explicitly instantiating a full dashboard
Implement validate methods for all components
Validation errors are highly precise and informative including the ability to highlight the exact part of the specification where the error occurred.

Design

The overall idea is that each component implements a validate classmethod which can validate the contents of a specification without instantiating the component and without pulling in data but is still able to attempt to resolve references and variables. Additionally it should be able to highlight the specific part of the specification that caused the validation error.

The validate method therefore must accept the specification for the component itself but also be given a validation context (to resolve references and variables).

Let us take a simple dashboard spec as an example:

config:
  title: Palmer Penguins
  theme: dark
  layout: tabs
variables:
  data:
    type: constant
    default: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv
sources:
  penguins:
    type: file
    cache_dir: ./cache
    tables:
      penguins: $variables.data
pipelines:
  penguins:
    source: penguins
    table: penguins
    filters:
      species:
        type: widget
        field: species
      island:
        type: widget
        field: island
      sex:
        type: widget
        field: sex
      expr:
        type: param
        parameter: scatter.selection_expr

Validation must happen in the following order:

variables
config
sources
pipelines
a. filters
b. transforms
targets
a. pipeline
b. facet
b. views

Now let us step through the process of validation:

def validate(dashboard_spec):
    validation_context = {'variables': {}, 'sources': {}, 'pipelines': {}, 'targets': []}
    for var_name in dashboard_spec.get('variables', {}).items():
        spec_path = f'variables.{var_name}'
        context['variables'][var_name] = Variable.validate(
            dict(var_spec, name=var_name), context
        )
    context['config'] = Config.validate(dashboard_spec['config'], context, dashboard_spec, 'config')
    for source_name, source_spec in dashboard_spec.get('sources', {}).items():
        spec_path = f'sources.{source_name}'
        context['sources'][source_name] = Source.validate(
            dict(source_spec, name=source_name), context
        )
    ....

and the validate signature is always:

def validate(cls, spec, context, full_spec=None, spec_path=None):    
    """
    Validates the component specification given the validation context.

    Arguments
    -----------
    spec: dict
        The specification for the component being validated.
    context: dict
        Validation context contains the specification of all previously validated components, e.g. to allow resolving of references.

    Returns
    --------
    Validated specification.

ValidationError

We should implement a ValidationError which can be given an error message, the validation context, the full spec and the path and then generates a helpful error message pointing to the issue in the context of the full specification.

Task List

Answer 1 · 2023-07-11T06:08:58.000Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.