mir-evaluation/mir_eval

Inaccurate pitch_tolerance in transcription.precision_recall_f1_overlap

ax-le opened this issue · 3 comments

ax-le commented

Computing transcription.precision_recall_f1_overlap([...], pitch_tolerance = 30) gives lower statistical outputs than with the default value of pitch_tolerance, transcription.precision_recall_f1_overlap([...], pitch_tolerance = 50) in my examples. (For example, F-measure is equal to 0.379 with a pitch_tolerance set to 30 and to 0.396 when pitch_tolerance is set to 50).
However, my estimated pitches are midi-scale integers, so as my ground truth. In that sense, the minimal positive gap between an estimated pitch and the ground truth is a semi-tone, or 100 cents.
Hence, a tolerance smaller than 100 cents shouldn't affect the statistical outputs.
I highly suspect a rounding operation misleading the pitch comparison.

Thanks for reporting this. Can you provide a MWE or example files which reproduce the issue?

ax-le commented

Sure, here are two .txt files reference.txt and estimation.txt reproducing the issue. The first and second columns of the files contain respectively the onset and offset times, and the third ones the pitches. In my tests, offset times were ignored.

Sorry, I missed an important detail in your first message. You wrote

my estimated pitches are midi-scale integers

That's the wrong format for transcription.precision_recall_f1_overlap. The docstring clearly says

Array of estimated pitch values in Hertz

You can convert from your MIDI pitches to Hz using either pretty_midi.note_number_to_hz or
librosa.midi_to_hz or just 440.0*(2.0**((note_number - 69)/12.0)).