/comref-converter

The Common Optical Music Recognition Framework conversion and evaluation toolset

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

The Common Western Music Notation Recognition Evaluation Framework (COMREF)

This repository is a collection of tools that allow interaction with the MTN format, a representation of western music scores designed with the following key features in mind:

  • Presentation before semantics: The format tries very hard not to make any assumptions about how the score has to be played, but rather how the score is laid out on paper.
  • Unique representation: Any score can only be represented one way. This allows for direct comparison of scores.
  • Source-independent: The format makes no assumptions about the origin of the predicted scores. Object detection or sequence-based methods can be targeted with this system. We are planning on adding a bounding-box extension by default.

This project is the code associated with the article A Unified Representation Framework for the Evaluation of Optical Music Recognition Systems, accepted for publication in the IJDAR-ICDAR track of 2024. If you find it useful, do not hesitate to cite the paper.

@article{torrasUnifiedRepresentationFramework2024,
  title = {A Unified Representation Framework for the Evaluation of {{Optical Music Recognition}} Systems},
  author = {Torras, Pau and Biswas, Sanket and Forn{\'e}s, Alicia},
  year = {2024},
  month = jul,
  journal = {International Journal on Document Analysis and Recognition (IJDAR)},
  issn = {1433-2825},
  doi = {10.1007/s10032-024-00485-8},
  urldate = {2024-07-24},
  abstract = {Modern-day Optical Music Recognition (OMR) is a fairly fragmented field. Most OMR approaches use datasets that are independent and incompatible between each other, making it difficult to both combine them and compare recognition systems built upon them. In this paper we identify the need of a common music representation language and propose the Music Tree Notation format, with the idea to construct a common endpoint for OMR research that allows coordination, reuse of technology and fair evaluation of community efforts. This format represents music as a set of primitives that group together into higher-abstraction nodes, a compromise between the expression of fully graph-based and sequential notation formats. We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.},
  langid = {english},
  keywords = {Computer vision,Datasets,Evaluation,Optical Music Recognition,Representation},
}

Dataset

The accompanying dataset can be downloaded from here. Note that the annotations have yet to be upgraded to the new 0.8 version of the converter!.

Getting started

The project is distributed as a Python package. Clone the repository and install by running pip setup <path to cloned repo>. It has very few dependencies that pip should manage for you. If there are any issues, find the dependencies of the project in the requirements.txt file of this repository.

To convert scores to MTN, it is as simple as going into the src dir and running:

python3 convert.py <SOURCE PATH .{mxl|mtn|seq|mei}> <TARGET PATH .{mxl|mtn|seq|mei|dot|apt}>

Currently, conversion from MTN to MXL is still not supported.

To run evaluation tasks for scores in MTN, use the evaluation.py script. It can be run both using independent files as input or lists of files. For a single file, run:

python3 evaluate.py --predictions <FILE> --targets <FILE>

If more than one file is required, add them as a list

python3 evaluate.py --predictions <FILE1> <FILE2> <FILE3> ... --targets <FILE>

Run python3 convert.py --help or python3 evaluate.py --help for more details regarding these scripts.

Documentation

The wiki contains all of the Documentation for the project (work in progress).

Known Issues

Check the issues section on the repository to find known bugs and issues.

Roadmap

Future directions for this format entail the following steps

  • Incorporating export to MusicXML features. Currently the seeds are planted, but we need to find a good algorithm to ensure that the main voicing is contiguous (or insert enough invisible padding).
  • MEI import/export. Currently this is not a priority because of the complexity of the task, but it could be done eventually if there is adoption.
  • Incorporating bounding boxes. The extension is planned and partly implemented, but we have no data to put yet. We could use the bounding boxes generated by Verovio, but they do not quite match all primitives in the notation. Thus, we are exploring alternatives.

License

This project is licensed under the GNU GENERAL PUBLIC LICENSE version 3. Check the COPYING file for the full license text.