JAMS Testing framework
Closed this issue · 2 comments
Testing framework for JAMS files following the JAMification step in ChoCo and assuming the availability of gold standards (manually annotated JAMS).
Preliminary sanity checks
Q: Is the given JAMS consistent and well-formatted? This applies for both gold and ChoCo JAMS before any further step is taken.
- The JAMS can be successfully parsed by
jams
either in validate mode or not. - Annotation times are <= than the total duration of the track / piece, if the latter information is available.
- Observations are temporally ordered in each annotation.
- There are no annotations duplicated.
- All possible fields in the sandbox are known, according to our schema (good to save this in a separate file).
If all these preliminary checks are passing, then we can go ahead with the gold-vs-ChoCo JAMS validation.
Metadata
Q: How good is the metadata layer in the JAMS?
Coverage: is it exhaustive? does it cover all possible fields?
Coverage is measured according to the proportion of non-null metadata fields in the ChoCo JAMS that are found in the gold JAMS.
- Case 1: gold has more fields (coverage is less than 1).
- Case 2: gold has fewer fields (there is a potential annotation issue).
- Case 3: fields are the same, regardless of their content (maximum coverage).
Accuracy: For those non-null metadata fields, how many of these are correct? Can we measure quality?
- Option 1: perfect match after basic preprocessing.
- Option 2: Non-perfect matches can be assessed by simple text-distance methods.
Identifiers and external links
Same as for the metadata (actually, this is a particular type of metadata): coverage and accuracy.
Chord and key annotations
Q: How good and reliable is the chord (or key) annotation in JAMS? Still, w.r.t. the original files.
Comparison is still focused on coverage and accuracy, but reported independently for times
, directions
, and values
. In this case, coverage does not look at the order, as it measures the amount of overlapping between the observation fields (this is because an extra observation may have been inserted, which breaks the expected alignment), whereas accuracy is a 1-to-1 comparison of fields -- which are assumed to be aligned. The latter can be reported according to the unit of measure of each field: seconds and measure.beats for time and duration, text-distance for string values,
First version of the sanity checks and testing scripts up. Still need to be tried on some intermediary validation JAMS and plugged into the CLI for use.