Neural Proxies for Sound Synthesizers:
Perceptually Informed Preset Representations

Official repository for the paper
"Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations"
published in the Journal of the Audio Engineering Society (JAES).

Overview

This repository provides:

Dataset generation for synthesizer presets
Training of neural proxies (preset encoders)
Evaluation on a sound-matching downstream task

→ Audio examples are available on the project website.

→ The repository for the audio models evaluation can be found here.

→ The published version of the paper is available on JAES's website here, while the Author's Accepted Manuscript (AAM) is available on arXiv.

Main dependencies

PyTorch + Lightning
DawDreamer (VST rendering)
WandB (logging)
Optuna (HPO)
Hydra (config management)

See requirements.txt for the full list.

Installation

Clone the repo and install via pip or Docker.

→ See Installation & environment setup for details.

Supported synthesizers

Currently, the following synthesizers are supported:

→ See Adding synthesizers for instructions on integrating new ones.

Audio models

Wrappers for the following audio models are available in the src/models/audio/ directory:

EfficientAT (used in the paper)
Torchopenl3
PaSST
Audio-MAE
Mel features.

→ See Adding audio models for integration instructions.

→ The code for the audio models evaluation can be found in its corresponding repository.

Preset Encoders (Neural Proxies)

An overview of the implemented neural proxies can be found in src/models/preset/model_zoo.py.

Download pretrained checkpoints here and place them in checkpoints/.

Datasets

See Datasets for download links and generation instructions of synthetic and hand-crafted preset datasets.

Experiments

This repository provides the following experiments:

Training and evaluation of synthesizer proxies.
Hyperparameter optimization (HPO) with Optuna.
Sound matching downstream tasks (finetuning + estimator network).

→ See Experiments for scripts, configs, and usage examples.

Reproducibility

The detailed step-by-step instructions to replicate the results from the paper, including model evaluation and visualization scripts can be found in Reproducibility.

Citation

@article{combes2025neural, 
  author={Combes, Paolo and Weinzierl, Stefan and Obermayer, Klaus}, 
  journal={Journal of the Audio Engineering Society}, 
  title={Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations}, 
  year={2025}, 
  volume={73}, 
  issue={9}, 
  pages={561-577}, 
  month={September},
}

Thanks

Special shout out to Joseph Turian for his initial guidance on the topic and overall methodology, and to Gwendal le Vaillant for the useful discussion on SPINVAE from which the transformer-based preset encoder is inspired.

DBraun/synth-proxy

Neural Proxies for Sound Synthesizers: Perceptually Informed Preset Representations