xsd:date YYYY-MM-DD vs NAACCR YYYYMMDD
Closed this issue · 3 comments
In today's gpc-dev call, @astoddard suggested that since NAACCR data is going to all be in XML by 2020 anyway, why not use XML Schema to define the data format?
It's a pretty good idea... in fact, XML Schema can express unions between integers and weird sentinel values such as XXX.5
, XXX.8
, XXX.9
.
But it wouldn't let us treat 20010101
as a date.
In imsweb/layout#72 @depryf noted a relevant technique:
Another big different between this library and NAACCR XML is that for convenience all date fields (which are YYYYMMDD) are considered "group fields" and define the three parts of the date as individual fields. This is a concept that NAACCR doesn't support, but it's REALLY useful for calling software since most algorithms are based on year, rarely month/day.
I don't plan to argue for changes to the XML format.
I'm content with the parsing design in f1a2ad5
naaccr-tumor-data/tumor_reg_data.py
Lines 540 to 563 in f1a2ad5