High-Resolution Violin Transcription using Weak Labels

We present the Multi-Stream Conformer (MUSC), a SOTA violin transcriber that converts 44.1 kHz raw audio into MIDI with 5.8ms time- and 10-cent frequency-resolution, and without requiring frame-wise labels during training!

You can transcribe solo violin YouTube recordings using the colab demo above, and you might be pleasantly surprised by its speed!:)

This demo is an accompanying material to the following paper:

N. C. Tamer, Y. Ozer, M. Muller, X. Serra, "High-Resolution Violin Transcription using Weak Labels", in Proc. ISMIR, 2023

Dataset

The dataset can be found under the dataset folder and is comprised of three violin etude books played by 22 violinists. We share the open-source MIDI files aligned with the performance links. The filenames are structured with reconstructable links to the performances:

{composer}_{catalog_number}_{performer}_{YouTube_ID}-{YouTube_start_sec}-{YouTube_end_sec}.mid

MUSC architecture

The model architecture can be found under the musc folder.

MTG/violin-transcription

High-Resolution Violin Transcription using Weak Labels

Dataset

MUSC architecture