CPJKU/partitura

Bug on `to_matched_score` - forcing note order is not working properly.

manoskary opened this issue · 7 comments

/partitura/musicanalysis/performance_codec.py", line 660, in <listcomp>
    pitch_onset = [(sn["pitch"].item(), sn["onset_div"].item()) for sn, _ in note_pairs]
ValueError: can only convert an array of size 1 to a Python scalar

Found the error. There are some match files that have the same note ID, as a deletion and as a match.
However, this should be checked when we are parsing the match file. Duplicated score note instances with multiple match lines.

Nice validation method! Minor things:

  • can we handle case where duplicate IDs a both match lines?
  • if in the case of duplicate performance IDs both are insertions, currently they are both being discarded. can we keep one? analogous case for duplicate score IDS that are both deletions.
  • the loops in 885 and 906 are over mf.lines, but the IDs to be deleted are from mf.snotes and mf.notes, which will probably lead to errors.
  • test is missing.
  • could maybe be implemented much more concisely
sids_unique, counts = np.unique(sids, return_counts=True)
sids_to_check = sids_unique[np.where(counts>1)[0]]
for duplicate_id in sids_to_check:

Nice validation method! Minor things:

* can we handle case where duplicate IDs a both match lines?

Possibly yes that would mean more tests.

* if in the case of duplicate performance IDs both are insertions, currently they are both being discarded. can we keep one? analogous case for duplicate score IDS that are both deletions.

Also yes, that would mean more tests and ifs

* the loops in 885 and 906 are over `mf.lines`, but the IDs to be deleted are from `mf.snotes` and `mf.notes`, which will probably lead to errors.

That should not lead to errors if it is done correctly, it was done this way to increase efficiency. The whole pipeline was done to not put too much computational weight onto checking, (most parts are done with numpy)

* test is missing.

Test is already implemented, I duplicated insertion and deletion lines in one of our match test files so the coverage goes through the clauses.

* could maybe be implemented much more concisely 
sids_unique, counts = np.unique(sids, return_counts=True)
sids_to_check = sids_unique[np.where(counts>1)[0]]
for duplicate_id in sids_to_check:

This could actually be a better and cleaner solution. Of course when sids_to_check is not empty then we need to go through all the lines again and perform the checks (as many as we want).

I added the proposed more concise implementation method.

ah true. the last two points are moot. I should have commented on the PR instead too. I take it you don't want to handle the cases of multiple insertions/deletions or different match lines, so I updated the docstring to avoid confusion. Feel free to remove if the function changes. I tag @neosatrapahereje to approve the PR #280 since this is match parsing related.

I checkout the current develop branch and can't reproduce the issue... Or is it being fixed by #280 already on the develop branch? Also not in my working branch performance_expressions.

I am using the data chopin_op04_Mv2.match where it indeed have note e.g. n720-2 both deleted and matched on line 924 and 926.

And from the performance_codec I still don't understand why does this breaks the code... like, firstly, the note_pairs loop only cares about the 'match' instead of the 'deletion' entry, and even if the same note has two 'match' entries, it will create one extra note pair instead of making len(sn) > 1 and run into that 'array size 1 error'. Unless the note is duplicated in score as well?

note_pairs = [
    (part_by_id[a["score_id"]], ppart_by_id[a["performance_id"]])
    for a in alignment
    if (a["label"] == "match" and a["score_id"] in part_by_id)
]
pitch_onset = [(sn["pitch"].item(), sn["onset_div"].item()) for sn, _ in note_pairs]

I definitely agree we need to check and validate the match file though, but still didn't manage to break this part

Hey! I still think the devil is hidden in the part creation, via musicxml -> there are unique snote IDs, via match -> this note is added twice (from deletion and match), so the part_by_id[non_unique_id] refers to numpy array with more than one entries.