Consider changing the data types to something less internal

Question

Consider changing the data types to something less internal

Opened this issue a year ago · 4 comments

In this column, the data types are named by their internal validation function for clojure.spec. As these are relatively arbitratily named, I would suggest changing them out for something 1) used more widely and 2) further abstracted 3) reveals less about inner implementation.

One such solution would be to use the Typescript standard, as that is becoming a very widely used type system.

the changes would look like this

inner implementation	typed reference	as multiple
uuid-string	uuid	uuid[]
strings<->uuids	uuid[]	uuid[]
single-string	string	string[]
cell-list	string[]	string[]
YN<->bool	boolean	boolean[]
string-date<->timestamp	timestamp	timestamp[]
status	"3" \ "2" \ "1"	Array<"3" \ "2" \ "1">

optional values get added question marks. string?, string[]?, etc

Answer 1 · 2023-09-26T13:40:32.000Z

Then, we can talk about how a boolean is represented as "Y" | "N" in the spreadsheet, or as a boolean in ingested data

Answer 2 · 2023-09-26T13:50:18.000Z

Yeah, agree. Or even have a human readable version, like I report in the detailed section: uuid-strings -> String, formatted in UUID format.

I think there's a good case for explaining what different data types are (strings, integers, bools), the related validators and what we use them for. For some readers, this may be the first time anyone's done that for them!

Answer 3 · 2023-09-26T13:52:52.000Z

There's a greater issue there of data-types in the sheet vs data-types in the intermediate system vs data-types in json exports.

Clojure/Script has a uuid data type, for example, where it is a string in the sheet and a string in the export. Similarly, dates have multiple representations along the chain.

They do have predictable transformations, though.

Answer 4 · 2023-09-26T14:29:50.000Z

I think there are three audiences, with various overlapping interests in the docs:
A) developers, who want to know how type handling should be managed;
B) researchers: likely to use the model in practice, and will have interest what they need to do for data validation/integrity/quality; and,
C) downstream users of product derived from our data: assessment of the meaning of a field and the rules used to create it, as an aid to understanding.

Stuff written for B and C is generally useful to A, but stuff for A less so for B and C!