craigsapp/humlib

[musicxml2hum] Missing exclusive interpretations and wrong order

Closed this issue · 8 comments

This is my source file: Schiorring - 01 Ach! Herre from.mxl.zip

Click to see the output of musicxml2hum
!!!OTL: Choral Bog
!!!OMV: Ach! Herre from
**kern	**fb	**kern	**kern	**kern	**kern	**kern
*part6	*part6	*part5	*part4	*part3	*part2	*part1
*staff6	*	*staff5	*staff4	*staff3	*staff2	*staff1
*I"Piano 2	*	*I"Piano 1	*I"Bass	*I"Tenor	*I"Alto	*I"Soprano
*clefF4	*clefC1	*	*clefF4	*clefC1	*clefF4	*clefC4	*clefC3	*clefC1
*	*	*	*	*	*	*Trd0c0	*	*
*k[]	*k[]	*	*k[]	*k[]	*k[]	*k[]	*k[]	*k[]
*	*	*	*	*	*	*	*	*
*M4/4	*M4/4	*	*M4/4	*M4/4	*M4/4	*M4/4	*M4/4	*M4/4
*met(c)	*met(c)	*	*met(c)	*met(c)	*met(c)	*met(c)	*met(c)	*met(c)
*	*	*	*^	*^	*	*	*	*
2C	2cc	.	2ryy	2C	2e	2g	2C	2ee	2g	2cc
=1	=1	=1	=1	=1	=1	=1	=1	=1	=1	=1
2G	2b	.	2d	2G	2b	2g	2G	2dd	2g	2b
2D	2a	7 #	2c	2D	2a	2f#X	2D	2cc	2f#X	2a
=2	=2	=2	=2	=2	=2	=2	=2	=2	=2	=2
2GG	2g	.	2B	2G	2g	2d	2GG	2b	2g	2g
2C	2g	-7	2B-X	2C	2g	2e	2C	2b-X	2e	2g
=3	=3	=3	=3	=3	=3	=3	=3	=3	=3	=3
2F	2a	.	2A	2F	2a	2c	2F	2a	2c	2a
2D	2b	6	2F	2D	2bni	2f	2D	2f	2d	2b
=4	=4	=4	=4	=4	=4	=4	=4	=4	=4	=4
2C	2cc	.	2G	2C	2cc	2e	2C	2g	2e	2cc
2GG	2dd	.	2G	2GG	2dd	2b	2GG	2b	2g	2dd
=5	=5	=5	=5	=5	=5	=5	=5	=5	=5	=5
2AA	2cc	6	2F#X	2AA	2cc	2a	2AA	2cc	2f#X	2cc
4BB	2b	6	2G	4BB	2b	2d	(4BB	2dd	2g	2b
4C	.	.	.	4C	.	.	4C)	.	.	.
=6	=6	=6	=6	=6	=6	=6	=6	=6	=6	=6
2D	2a	#	2F#X	2D	2a	1d	2D	2dd	2f#X	2a
2GG	2b	.	[2G	2GG	2b	.	2GG	2dd	[2g	2b
=7	=7	=7	=7	=7	=7	=7	=7	=7	=7	=7
2C	1a	6 5	2G]	2C	1a	2e	(2C	(2ee	(2g]	1a
2D	.	#	2F#X	2D	.	2d	2D)	4dd	2f#X)	.
.	.	.	.	.	.	.	.	4cc)	.	.
=8	=8	=8	=8	=8	=8	=8	=8	=8	=8	=8
*	*	*	*	*	*	*^	*	*	*	*
2GG	2g	.	2G	2GG	2g	2B	2d/	2GG	2b	2g	2g
2E	2cc	6	2G	2E	2cc	2c	2g/	2E	2cc	2g	2cc
*	*	*	*	*	*	*v	*v	*	*	*	*
=9	=9	=9	=9	=9	=9	=9	=9	=9	=9	=9
2D	2b	6	2F	2D	2b	2d	2D	2dd	2f	2b
2C	2cc	.	2E	2C	2cc	2g	2C	2ee	2g	2cc
=10	=10	=10	=10	=10	=10	=10	=10	=10	=10	=10
2GG	2dd	.	2G	2GG	2dd	2b	2GG	2b	2g	2dd
2BB	2dd	6	2G	2BB	2dd	2g	2BB	2dd	2g	2dd
=11	=11	=11	=11	=11	=11	=11	=11	=11	=11	=11
2C	2ee	.	2c	2C	2ee	2g	2C	2cc	2g	2ee
2AA	2cc	6	2c	2A	2cc	2f#X	2AA	2cc	2f#X	2cc
=12	=12	=12	=12	=12	=12	=12	=12	=12	=12	=12
2GG	2dd	.	2B	2G	2dd	2d	2GG	2b	2d	2dd
2E	2g	6	2c	2E	2g	2g	2E	2cc	2g	2g
=13	=13	=13	=13	=13	=13	=13	=13	=13	=13	=13
2E-X	2a	4 2	2c	2E-X	2a	2f	2E-X	2cc	2f	2a
2D	2b	6	2d	2D	2b	2f	2D	2dd	2f	2b
=14	=14	=14	=14	=14	=14	=14	=14	=14	=14	=14
2Eni	2cc	6	2c	2Eni	2cc	2g	2Eni	2cc	2g	2cc
2C	2ee	.	[2c	2C	2ee	2g	2C	[2cc	2g	2ee
=15	=15	=15	=15	=15	=15	=15	=15	=15	=15	=15
2F	1dd	6 5	2c]	2F	1dd	2a	2F	(2cc]	(2a	1dd
2G	.	.	2B	2G	.	2g	2G	2b)	4g	.
.	.	.	.	.	.	.	.	.	4f)	.
=16	=16	=16	=16	=16	=16	=16	=16	=16	=16	=16
2C	2cc	.	2c	2C	2cc	2g 2e	2C	2cc	2e	2cc
*	*	*	*v	*v	*	*	*	*	*	*
*	*	*	*	*v	*v	*	*	*	*
==	==	==	==	==	==	==	==	==
*-	*-	*-	*-	*-	*-	*-	*-	*-
!!!system-decoration: s1,s2,s3,s4,s5,s6
!!!RDF**kern: i = editorial accidental

musicxml2hum fails to convert it to a valid humdrum file and I observed several problems:

  1. There are 6 parts (SATB, Piano 1, Piano 2) but 8 staves are needed because of the two piano parts. However, there are only 6 **kern spines created. But in the output starting from the line with the clefs interpretations it seems to be correct, there are the 8 spines.
  2. On the first line with data tokens there is cc missing in Piano 1 and it seems like the notes of the spine splits are arranged in the wrong order (high notes in left spine, low note in right spine). This only seems to be a problem for the first line.
  3. The **fb spine is included but the data and the corresponding exclusive interpretations are swapped. Compare 2nd and 3rd and spine. (Note that in the source score the piano 2 has the figured bass numbers semantically assigned to the wrong system (right hand), I would rearrange this with extractxx later on and bind them to the left hand system and use *above instead.)
  4. Why is there a interpretation line with Trd0c0?
  5. system-decoration should probably put brackets around the piano parts.

I will try to improve this as much as possible on my own, but tips for a good implementation would be helpful, since I have not yet familiarized myself with musicxml2hum.

You can post a sample of the music notation from a Dorico screenshot for examination (I don't have it on my travel computer that I am currently using).

Graphic notation (MusicXML direct to MEI with verovio):

Screenshot 2023-11-28 at 04 58 55

Loading into Musescore 4:

Screenshot 2023-11-28 at 05 05 03

Finale import:

Screenshot 2023-11-28 at 05 58 11

I don't (yet) have the original Dorico file (and I haven't installed it either). A music theory professor friend of mine provided me with the MusicXML files. Sadly the Dorico MusicXML export does not include fermatas. However here is a screenshot of the scan of the print:

Bildschirmfoto 2023-11-27 um 22 00 27

On IMSLP they don't have the version with the middle voices included: https://imslp.org/wiki/Special:ReverseLookup/668808

Your screenshot from Verovio and MuseScore looks good, except for the figured bass numbers. But I cannot guarantee that they are perfectly encoded in the MusicXML file. I will probably proofread everything and add the fermatas (and probably also the odd trills) once I have it in Humdrum.

Why is there a interpretation line with Trd0c0?

This is caused by this transposition instruction in the Tenor part:

        <transpose number="1">
          <diatonic>0</diatonic>
          <chromatic>0</chromatic>
          <octave-change>-1</octave-change>
        </transpose>

The diatonic and chromatic values are being recognized, but the octave-change does not seem to be implemented in the converter. There is a problem with processing octave-change in direct loading of MusicXML into verovio as well as in Musescore 4 and Finale 27, where the pitches are displayed an octave too high.

Use of an octave transposition for displaying on C4 clef is unusual and may be a bug in the MusicXML export from Dorico, or a subtle bug related to starting with a vocal tenor clef and then changing the clef to C4 (Sibelius can have buggy vocal tenor clef output depending on how you create the clef, for example). Note that Dorico developers were previously Sibelius developers and speculate as to why the bugs seem to be related.

The transposition instruction should be *ITrd7c12 (or *ITrd-7c-12). I suspect that octave transposition is an error generated in the MusicXML export (that is why a display of the Dorico notation rendering will be useful to determine that).

The octave down transposition needs to be applied to the notes in the Humdrum output (store notes at souding pitch in Humdrum scores is highly preferred), and then *ITrd7c12 needs to be added to indicate written pitch is one octave higher. But this transposition should not be needed and is caused by some problem in the MusicXML export from Dorico.

When this happens in Sibelius files, I used the transpose filter in VHV with the -s option to select a single spine to transpose. Here is the filter I applied (after fixing the spine starts for the piano parts):

transpose -s 6 -t -P8

Meaning: transpose spine 6 down an octave.

I just updated the output on my initial post. I accidentally posted a version where I had removed the **fb spine for debugging purposes.

I suspect that there is a bug added to musicxml2hum since July 2021, because the command-line version that I compiled on that date converts correctly, while a newly compiled version is giving the same problem you are having.

Expected conversion:

Screenshot 2023-11-28 at 07 52 37

Output from July 2021 compiled version of musescore2hum:

!!!OTL: Choral Bog
!!!OMV: Ach! Herre from
**kern	**kern	**fb	**kern	**kern	**kern	**kern	**kern	**kern
*part6	*part6	*part6	*part5	*part5	*part4	*part3	*part2	*part1
*staff8	*staff7	*	*staff6	*staff5	*staff4	*staff3	*staff2	*staff1
*I"Piano 2	*	*	*I"Piano 1	*	*I"Bass	*I"Tenor	*I"Alto	*I"Soprano
*clefF4	*clefC1	*	*clefF4	*clefC1	*clefF4	*clefC4	*clefC3	*clefC1
*	*	*	*	*	*	*Trd0c0	*	*
*k[]	*k[]	*	*k[]	*k[]	*k[]	*k[]	*k[]	*k[]
*	*	*	*	*	*	*	*	*
*M4/4	*M4/4	*	*M4/4	*M4/4	*M4/4	*M4/4	*M4/4	*M4/4
*met(c)	*met(c)	*	*met(c)	*met(c)	*met(c)	*met(c)	*met(c)	*met(c)
*	*	*	*^	*^	*	*	*	*
2C	2cc	.	2ryy	2C	2e	2g	2C	2ee	2g	2cc
=1	=1	=1	=1	=1	=1	=1	=1	=1	=1	=1
2G	2b	.	2d	2G	2b	2g	2G	2dd	2g	2b
2D	2a	7 #	2c	2D	2a	2f#X	2D	2cc	2f#X	2a
=2	=2	=2	=2	=2	=2	=2	=2	=2	=2	=2
2GG	2g	.	2B	2G	2g	2d	2GG	2b	2g	2g
2C	2g	-7	2B-X	2C	2g	2e	2C	2b-X	2e	2g
=3	=3	=3	=3	=3	=3	=3	=3	=3	=3	=3
2F	2a	.	2A	2F	2a	2c	2F	2a	2c	2a
2D	2b	6	2F	2D	2bni	2f	2D	2f	2d	2b
=4	=4	=4	=4	=4	=4	=4	=4	=4	=4	=4
2C	2cc	.	2G	2C	2cc	2e	2C	2g	2e	2cc
2GG	2dd	.	2G	2GG	2dd	2b	2GG	2b	2g	2dd
=5	=5	=5	=5	=5	=5	=5	=5	=5	=5	=5
2AA	2cc	6	2F#X	2AA	2cc	2a	2AA	2cc	2f#X	2cc
4BB	2b	6	2G	4BB	2b	2d	(4BB	2dd	2g	2b
4C	.	.	.	4C	.	.	4C)	.	.	.
=6	=6	=6	=6	=6	=6	=6	=6	=6	=6	=6
2D	2a	#	2F#X	2D	2a	1d	2D	2dd	2f#X	2a
2GG	2b	.	[2G	2GG	2b	.	2GG	2dd	[2g	2b
=7	=7	=7	=7	=7	=7	=7	=7	=7	=7	=7
2C	1a	6 5	2G]	2C	1a	2e	(2C	(2ee	(2g]	1a
2D	.	#	2F#X	2D	.	2d	2D)	4dd	2f#X)	.
.	.	.	.	.	.	.	.	4cc)	.	.
=8	=8	=8	=8	=8	=8	=8	=8	=8	=8	=8
*	*	*	*	*	*	*^	*	*	*	*
2GG	2g	.	2G	2GG	2g	2B	2d/	2GG	2b	2g	2g
2E	2cc	6	2G	2E	2cc	2c	2g/	2E	2cc	2g	2cc
*	*	*	*	*	*	*v	*v	*	*	*	*
=9	=9	=9	=9	=9	=9	=9	=9	=9	=9	=9
2D	2b	6	2F	2D	2b	2d	2D	2dd	2f	2b
2C	2cc	.	2E	2C	2cc	2g	2C	2ee	2g	2cc
=10	=10	=10	=10	=10	=10	=10	=10	=10	=10	=10
2GG	2dd	.	2G	2GG	2dd	2b	2GG	2b	2g	2dd
2BB	2dd	6	2G	2BB	2dd	2g	2BB	2dd	2g	2dd
=11	=11	=11	=11	=11	=11	=11	=11	=11	=11	=11
2C	2ee	.	2c	2C	2ee	2g	2C	2cc	2g	2ee
2AA	2cc	6	2c	2A	2cc	2f#X	2AA	2cc	2f#X	2cc
=12	=12	=12	=12	=12	=12	=12	=12	=12	=12	=12
2GG	2dd	.	2B	2G	2dd	2d	2GG	2b	2d	2dd
2E	2g	6	2c	2E	2g	2g	2E	2cc	2g	2g
=13	=13	=13	=13	=13	=13	=13	=13	=13	=13	=13
2E-X	2a	4 2	2c	2E-X	2a	2f	2E-X	2cc	2f	2a
2D	2b	6	2d	2D	2b	2f	2D	2dd	2f	2b
=14	=14	=14	=14	=14	=14	=14	=14	=14	=14	=14
2Eni	2cc	6	2c	2Eni	2cc	2g	2Eni	2cc	2g	2cc
2C	2ee	.	[2c	2C	2ee	2g	2C	[2cc	2g	2ee
=15	=15	=15	=15	=15	=15	=15	=15	=15	=15	=15
2F	1dd	6 5	2c]	2F	1dd	2a	2F	(2cc]	(2a	1dd
2G	.	.	2B	2G	.	2g	2G	2b)	4g	.
.	.	.	.	.	.	.	.	.	4f)	.
=16	=16	=16	=16	=16	=16	=16	=16	=16	=16	=16
2C	2cc	.	2c	2C	2cc	2g 2e	2C	2cc	2e	2cc
*	*	*	*v	*v	*	*	*	*	*	*
*	*	*	*	*v	*v	*	*	*	*
==	==	==	==	==	==	==	==	==
*-	*-	*-	*-	*-	*-	*-	*-	*-
!!!system-decoration: s1,s2,s3,s4{(s5,s6)}{(s7,s8)}
!!!RDF**kern: i = editorial accidental

This was a perfect use case to debug it with git bisect. I found the first bad commit, it is 24db155. The problems seems to be the change in HumGrid::getStaffCount:

24db155#diff-c557cf502a627b7aab49de8d3977a3473cca135e87b0bcf353bafc289b331348L132-L133

For now I just reverted this change, but you probably had a reason to change this line. Will I break something else if I revert this change?

Thanks for tracking the problem down! (I will also have to familiarize myself with git bisect).

The problem as you note is this line changed two months ago:

humlib/src/HumGrid.cpp

Lines 134 to 135 in 24db155

return (int)this->at(0)->front()->at(partindex)->size();
// return (int)this->at(0)->back()->at(partindex)->size();

Notice that I kept the previous code as a comment since I was not totally certain about the ramifications of changing the code.

What this line of code does is infer the number of staves in the part by looking at the data line at the start of the first measure in the score, which is causing a problem somehow. Perhaps the information such as the clef, time signature, and key signature are not assigned explicitly to staves 1 and 2, so they are stored only in the first staff, resulting in the current algorithm only seeing a single staff rather than two. Looking at the data, this does not seem to be the case since there are clearly two staves of information in the <attributes> of the last part (P6) in the MusicXML data:

      <attributes>
        <divisions>4</divisions>
        <key number="1">
          <fifths>0</fifths>
          <mode>none</mode>
        </key>
        <key number="2">
          <fifths>0</fifths>
          <mode>none</mode>
        </key>
        <time symbol="common">
          <beats>4</beats>
          <beat-type>4</beat-type>
        </time>
        <staves>2</staves>
        <clef number="1">
          <sign>C</sign>
          <line>1</line>
        </clef>
        <clef number="2">
          <sign>F</sign>
          <line>4</line>
        </clef>
      </attributes>

Loading from Musescore output has the same problem with staff assignments for key and time needing to be inferred from <staves>2</staves> rather than being explicitly indicated by @number on duplicated entries for key and time:

        <divisions>1</divisions>
        <key>
          <fifths>0</fifths>
          <mode>none</mode>
          </key>
        <time symbol="common">
          <beats>4</beats>
          <beat-type>4</beat-type>
          </time>
        <staves>2</staves>
        <clef number="1">
          <sign>C</sign>
          <line>1</line>
          </clef>
        <clef number="2">
          <sign>F</sign>
          <line>4</line>
          </clef>
        </attributes>

Here are some notes on the HumGrid class:

HumGrid is a vector of GridMeasure* is a vector of GridSlice* is a vector of GridPart* is a vector of GridStaff* (and GridSide). With GridStaff being a vector of GridVoice* (and GridSide).

GridSide is for tandem spines of non-kern data such as harmony, lyrics, figured bass, chords, etc. There is a GridSide for the system (primarily for harmony) and a GridSide for each staff.

GridSlice is analogous to a HumdrumLine.

GridPart is not directly represented in Humdrum spines, as I prefer to make each staff a separate spine. This is similar to MEI but not to MusicXML. I instead use *part and *staff interpretations to keeps track of grand-staff cases (and in theory three-stave parts such as organ). In theory the HumGrid system could output a GridPart as a spine, with GridStaff as subspines:

**kern
*part1
*^
*staff2	*staff1
=1	=1
1c	1c
=	=
*v	*v
*-

but I find that a bother to deal with piano music and tools such as extract.

The HumGrid class was created specifically to convert from MusicXML into Humdrum (but should be useful to convert from other data types into Humdrum). It is an intermediary step that does not deal with spine manipulations until after the MusicXML content has been loaded. It also manages creation of null data tokens as part of the process to generate a Humdrum file.


The HumGrid::getStaffCount() function should return the number of staves for the part, which is supposed to be 2 for grand-staff parts such as the piano. This function is currently inferring the number of staves in a part by looking at how many staves are present in the first measure of the part. It is intended to look only at data lines (GridSlices) but currently it looks at any type of GridSlice (which is causing problems).

In English(ish), this line:

return (int)this->at(0)->front()->at(partindex)->size(); 

means "return the size of the staff vector for the given part from the first slice of the first measure"

while the previous method:

return (int)this->at(0)->back()->at(partindex)->size();

means "return the size of the staff vector for the given part from the last slide of the first measure"

This second version was failing due in cases where there is a global comment at the end of the first measure, such as:

**kern	**kern
*part1	*part1
*staff2	*staff1
=1	=1
1c	1c
!!linebreak: original
=	=
*-	*-

Where I add a comment at the end of a measure if there is a system break after that measure. The problem is that if the first measure contains a comment at the end of the measure, then there is only one "staff" for this line.

I therefore thought that looking instead at the first entry in the measure would be better (but obviously not). I will use your pull request to revert to looking at the back, but then generalize the code to skip non-spine GridSlice line types by iterating backwards in the GridSlices in the measure until I find data or otherwise the maximum staff count for interpretation/local comment slices.

A more ideal solution would probably to have a state variable for GridPart that stores the maximum staff index of inserted data in all GridMeasure/GridSlice for each GridPart. This would have to be stored in the HumGrid level since the GridPart is embedded within the GridMeasure/GridSlice vectors. But the algorithm in the previous paragraph should work unless (1) there is no data in the first measure, or (2) for some reason only notes/rests are given for only one staff in the first measure.

Thank you for the detailed explanation. It will be of great use when I try to solve some problems with the import of the Schiørring chorales. Especially with the first piano system with the full harmony example (part 5) there were some problems with the rhythm. Also stem directions seem to be incorrect which might be a bad encoding in Dorico or a bad export to MusicXML. I'm not sure yet if the problem for all this is musicxml2hum or a bad MusicXML files. Sometimes the sections where it switches between 4 and 5 voices in the piano were not correct and there were problems with the spine splits. Or missing notes.

I fixed this manually for the first 3 chorales, but it is too time consuming for all 100, so I hope I can solve this by improving the musicxml2hum.

Here is the link to the repository: https://github.com/WolfgangDrescher/schiorring-choral-bog. Keep in mind that repo is heavily work in progress and probably most chorales will still fail to render in Verovio because of invalid humdrum files (except the first 3 and some files that are correct "by accident").