Columnar Data Validator

This mini project is aimed at consolidating a generic Data Validation specification for columnar data files, e.g. CSV, Xlsx, etc., open to be implemented in any language.

Below are the areas of emphasis:

Validation Engine Pipeline
Validation Rules
Input Spec
Output Spec

Validation Engine Pipeline

Validation Rules

Rules should cover destination column definitions + custom rules (optional)

Destination specifics

Custom Rules:

Name	Description	Params
NotEmpty	Check for value. Cannot be empty
StringLength	Check string length	- min: number - max: number - exact: number
SumOfColumns	Check that value holds computation from defined columns	- ordinal: number - computationOrdinals: number[]
etc...

To research:

Mapping from source:[lang_specific | generic] <--> destination:[databases - mssql, mysql, postgresql, etc.]
Take a look at strategies taken at ORM such as EF, SQLAlchemy, Gorm, etc.

Input Spec

rulesets.json

The file that defines the entire process.

{
  "id": string | Guid,
  "name": string,
  "destinationType": "MSSQL" | "MySQL" | "PostgreSQL" | etc...,
  "rulesets": [
    {
      "ordinal": number,
      "rules": [
        {
          "rule": ValidationRuleName [ Destination specific - (BIGINT | INT | etc...) | Custom - (NotEmpty | SumOfColumns | etc...)],
	  "severity": "Warning" | "Error",
	  "params"?: ValidationRuleParams
        },
        ...
      ]
    },
    ...
  ]
}

Data Source

Define the source of data. Application should then handle the ingestion accordingly.

Remote
- Protocol - FTP, SFTP, HTTP, etc.
- Url
Stream
Json

Output Spec

{
  "status": "Success" | "Warning" | "Error",
  "detail": [
    {
      "field": string,
      "rules": [
        {
          "rule": ValidationRuleName [ Destination specific - (BIGINT | INT | etc...) | Custom - (NotEmpty | SumOfColumns | etc...)],
          "status": "Success" | "Warning" | "Error"
        },
        ...
      ]
    },
    ...
  ]
}

DriLLFreAK100/columnar-data-validator

Columnar Data Validator

Validation Engine Pipeline

Validation Rules

Input Spec

Output Spec