/ATEPP

ATEPP is a dataset of expressive piano performances by virtuoso pianists. The dataset contains 11742 performances (~1000 hours) by 49 pianists and covers 1893 compositions by 25 composers. All of the MIDI files in the dataset come from the piano transcription of existing audio recordings of piano performances. Scores in MusicXML format are also available for around half of the tracks. The dataset is organized and aligned by compositions and movements for comparative studies.

Primary LanguagePythonCreative Commons Zero v1.0 UniversalCC0-1.0

ATEPP: A Dataset of Automatically Transcribed Expressive Piano Performances

ATEPP is a dataset of expressive piano performances by virtuoso pianists. The dataset contains 11742 11677 performances (~1000 hours) by 49 pianists and covers 1580 movements by 25 composers. All of the MIDI files in the dataset come from the piano transcription of existing audio recordings of piano performances. Scores in MusicXML format are also available for around half of the tracks. The dataset is organized and aligned by compositions and movements for comparative studies. More details are presented in the paper.

Downloade the ATEPP dataset

Please follow disclaimer.md to agree a disclaimer and download a latest version of ATEPP (~212MB).

Inference

You can inference your own track with the modified code and new checkpoint in piano_transcription-master. The env and setup are the same as https://github.com/bytedance/piano_transcription

python3 pytorch/inference.py --model_type=Regress_onset_offset_frame_velocity_CRNN --checkpoint_path=300000_iterations.pth --audio_path="resources/schumann_romanzen.mp3" --cuda

Statistics

Version-1.0

  • 11742 performances (in midi format)
  • 1007 hours
  • 1580 movements
  • 25 composers
  • 49 performers
  • 43% with scores

Version-1.1

Updates: When creating ATEPP version-1.0, we only applied movement-wise matching to remove erroneously downloaded audio. Now, we finished detecting repeated audios by audio-wise fingerprint matching. Only 65 audios were detected repeated, and the corresponding transcribed midi files were removed. The repeats.csv lists the repeated transcribed files that have been removed.

Changed Statistics:

  • 11677 performances
  • 1002 hours

Related Works

Composition Entity Linker

We've released a Python package developed for linking classical music recording & track to the corresponding composition / movement, useful in cleaning up metadata in classical music datasets.

Package on PyPI: https://pypi.org/project/composition-entity-linker/

Contact

Cite

@inproceedings{zhang2022atepp,
  title={ATEPP: A Dataset of Automatically Transcribed Expressive Piano Performance},
  author={Zhang, Huan and Tang, Jingjing and Rafee, Syed Rifat Mahmud and Fazekas, Simon Dixon Gy{\"o}rgy},
  booktitle={Ismir 2022 Hybrid Conference},
  year={2022}
}

License

CC BY 4.0