datacontract/datacontract-cli

Testing complex data types

Opened this issue · 5 comments

Hi!

Based on your documentation I assume the following is not possible. Nevertheless I would like to double check.

Is it possible to test the details of data contracts with complex/nested types?

E.g. imagine the following example. I have a field products of type array (Python: list) and a field shops-added of type struct (Python: dict).

My goal is to test whether:
(1) field products is of type array,
(2) field shops-added is of type struct,
(3) field products contains only values of type string,
(4) further restrictions on values in field products (e.g. length, regex patterns) hold true,
(5) field shops-added contains only keys of type string (e.g. "abc"),
(6) further restrictions on keys in field shops-added (e.g. length, regex patterns) hold true,
(7) field shops-added contains only values of type timestamp (e.g. 2024-06-01),
(8) further restrictions on keys in field shops-added hold true.

However, I do not want to split out the complex types into multiple models, but test everything with only one model.

{
   "id":"01",
   "name":"hans",
   "age":41,
   "products":[
      "sku_01",
      "sku_04"
   ],
   "shops_added":{
      "abc":"2024-06-01",
      "xyz":"2024-06-09"
   }
}

P.S. Apologies for spamming you with quite a number of questions recently

What is your server type?

There is a JSON-Schema Check Engine implemented for JSON files on S3.

My server is dataframe / temporary view in Databricks :)

In general, with dataframes there should be support by the Soda Code engine that is used internally, as per this test:
https://github.com/sodadata/soda-core/blob/af649b977fc2489eb841cf16ab4f0d9fc3da2165/soda/spark_df/tests/test_spark_df.py#L6

Might be worth a try.

Thanks :)

Hi @jochenchrist

Follow-up question:
Is it possible to test the content of an array, whose length can be variable?

Going back to my example from the opening post, what I specifically want to test is that:

(1) field products is of type array,
(3) field products contains only values of type string,
(4) further restrictions on values in field products (e.g. length, regex patterns) hold true