Evaluation Performance

Question

Evaluation Performance

Closed this issue 7 months ago · 4 comments

Hi,

Thanks for sharing your work.

When I use the pre-trained model checkpoint from your repo(https://github.com/qiuqiangkong/piano_transcription_inference),
I can't get the performance of the original paper's score.

I set the arguments like below.

Much lower performance results were obtained.

Can I get your paper's score with this source code? or do I need to edit something?
(https://github.com/bytedance/piano_transcription/blob/master/pytorch/calculate_score_for_paper.py)

Thank you.

Answer 1 · 2024-01-03T14:35:45.000Z

Oh, it's right that the performance comes out well. Thank you.

Answer 2 · 2024-03-20T11:00:12.000Z

Hey, I encountered the same problem. May I know what have you done to achieve the same performance results from the paper?

Answer 3 · 2024-03-21T04:58:39.000Z

Hi,
If you check closely the given evaluation code, they use mir_eval.transcription.precision_recall_f1_overlap for the Note F1-score.
However, it is for the Note w/ Offset F1-score, not the Note F1-score.
If you want to get Note F1-score, you should use mir_eval.transcription.onset_precision_recall_f1.
(Please refer to the following link: https://github.com/craffel/mir_eval/blob/main/mir_eval/transcription.py)

Answer 4 · 2024-03-21T11:23:30.000Z

Thank you very much!