pkiraly/metadata-qa-api

Output JSON instead of CSV

Closed this issue · 2 comments

Because metrics are dynamic and configurable, returning the results in JSON instead of CSV would have some processing advantages:

  • direct access to metrics though keys eg. { "existence": ..., "completeness": ... }
  • metric specific output eg. { "completeness": { "TOTAL": 0.55, "IDENTITY": 0.33}, "existence": { "column1": true, "column2": false } }
  • support for non-single value metrics: eg. { "language": {"en": 0.6, "fr":0.3, "unknown": 0.1 } }
  • adding custom fields such as the original record identifier { info: {"id": record1 } }
  • native support for numbers

For instance, instead of

"completeness:TOTAL","completeness:IDENTITY","completeness:MANDATORY","completeness:SEARCHABILITY","completeness:DESCRIPTIVENESS","existence:mdproperties/ad_model","existence:mdproperties/ad_serial_number","existence:mdproperties/audio_carrier_speed"
"0.067797","0.027397","0.25","0.047059","0.666667","0","0"

you could get

[
   { 
     "completeness": { 
          "TOTAL":  0.067797, 
          "IDENTITY": 0.027397, 
          "SEARCHABILITY": 0.25, 
          "DESCRIPTIVENESS": 0.666667
     },
     "existence": {
         "mdproperties/ad_serial_number": false,
         "mdproperties/audio_carrier_speed": false,
    }

},
...
]

This would require an extension of the measureAsList to return List<JSONObject> or a more generic List<MetricResult>

WDYT?

@mielvds I improved the documentation of the output. Please check it in the README file.

clear now. Thanks!