haidog-yaqub/DPMTSE

A Diffusion Probabilistic Model for Target Sound Extraction

Python

DPM-TSE

Official Pytorch Implementation of DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

🎧 Listen to examples on the Demopage

🔥 Updates: SoloAudio is now available! This advanced diffusion-transformer-based model extracts target sounds from free-text input.

Content

Usage
References
Acknowledgement

Usage

Download checkpoints and dataset from this 🤗 link
Prepare environment: requirement.txt

# Training
python src/train_ddim_cls.py --data-path 'data/fsd2018/' --autoencoder-path 'ckpts/first_stage.pt' --autoencoder-config 'ckpts/vae.yaml' --diffusion-config 'src/config/DiffTSE_cls_v_b_1000.yaml'

# Inference
python src/tse.py --device 'cuda' --mixture 'example.wav' --target_sound 'Applause' --autoencoder-path 'ckpts/first_stage.pt' --autoencoder-config 'ckpts/vae.yaml' --diffusion-config 'src/config/DiffTSE_cls_v_b_1000.yaml' --diffusion-ckpt 'ckpts/base_v_1000.pt'

References

If you find the code useful for your research, please consider citing:

@inproceedings{hai2024dpm,
  title={DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction},
  author={Hai, Jiarui and Wang, Helin and Yang, Dongchao and Thakkar, Karan and Dehak, Najim and Elhilali, Mounya},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1196--1200},
  year={2024},
  organization={IEEE}
}

Acknowledgement

We borrow code from following repos:

Diffusion Schedulers and 2D UNet are based on 🤗 Diffusers
16k HiFi-GAN vocoder is borrowed from AudioLDM