/textalignsynth

This is a Python package for speech-based auditory display.

Primary LanguageJupyter NotebookMIT LicenseMIT

textalignsynth

Hearing Your Way Through Music Recordings: A Text Alignment and Synthesis Approach

This is a repository accompanying the following paper:

@inproceedings{StrahlOBM25_TextAlignSynth_SMC,
  author    = {Sebastian Strahl and Yigitcan {\"O}zer and Hans-Ulrich Berendes and Meinard M{\"u}ller},
  title     = {Hearing Your Way Through Music Recordings: A Text Alignment and Synthesis Approach},
  booktitle = {Proceedings of the Sound and Music Computing Conference ({SMC})},
  address   = {Graz, Austria},
  year      = {2025}
}

This repository contains an implementation of parts of the processing pipeline described in above paper. The implementation comprises text comment generation for the case studies described in the paper, text-to-speech synthesis using the TTS python package, post-processing of the synthesized speech signals, and superposition with the original recording.

For details and references, please see the paper.

Installation

1. Set up Python environment

We recommend setting up a Python environment including Pytorch before installing textalignsynth. You may use the example environment provided as part of this package:

git clone https://github.com/groupmm/textalignsynth.git
cd textalignsynth
conda env create -f environment.yaml
conda activate textalignsynth

2. Install textalignsynth

Option 1: Installation without cloning this repository:

pip install "git+https://github.com/groupmm/textalignsynth.git#egg=textalignsynth"

Option 2: Installation by cloning this repository:

git clone https://github.com/groupmm/textalignsynth.git
cd textalignsynth
pip install -e .

Warnings:

  • ⚠️ Does not work on Windows machines! Workaround: Use Windows Subsystem for Linux (WSL).
  • ⚠️ German TTS model requires espeak-ng or espeak to be intalled on the machine!

Contribution

Automated code style checks via pre-commit:

pip install pre-commit
pre-commit install

License

The code for this toolbox is published under an MIT license. This does not apply to the data files:

Acknowledgements

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) Grant No. 500643750 (MU 2686/15-1). The authors are with the International Audio Laboratories Erlangen, a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS.