Your-Stable-Audio

UnOfficial PyTorch implementation of Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion

Your-Stable-Audio (💻WIP)

TODO List
References
Acknowledgement

TODO List

Classifier-free diffusion guidance
Fixed diffusion: Common Diffusion Noise Schedules and Sample Steps are Flawed
Add configs and training examples
Upload model weights and demos
Update evaluation metric
Add Timing Embeddings proposed by Stable Audio
Support other tasks: Sound Extraction, Editing, Inpainting, Super-Resolution, etc.

References

If you find the code useful for your research, please consider citing:

@article{hai2023dpm,
  title={DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction},
  author={Hai, Jiarui and Wang, Helin and Yang, Dongchao and Thakkar, Karan and Dehak, Najim and Elhilali, Mounya},
  journal={arXiv preprint arXiv:2310.04567},
  year={2023}
}

This repo is inspired by:

@misc{Stability2023stable,
  title = {Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion},
  howpublished = {https://stability.ai/research/stable-audio-efficient-timing-latent-diffusion},
  year = {2023},
}

@article{defossez2022high,
  title={High fidelity neural audio compression},
  author={Défossez, Alexandre and Copet, Jade and Synnaeve, Gabriel and Adi, Yossi},
  journal={arXiv preprint arXiv:2210.13438},
  year={2022}
}

@inproceedings{rombach2022high,
  title={High-resolution image synthesis with latent diffusion models},
  author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Björn Ommer},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={10684--10695},
  year={2022}
}

@article{ghosal2023tango,
  title={Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model},
  author={Ghosal, Deepanway and Majumder, Navonil and Mehrish, Ambuj and Poria, Soujanya},
  journal={arXiv preprint arXiv:2304.13731},
  year={2023}
}

@article{lin2023common,
  title={Common Diffusion Noise Schedules and Sample Steps are Flawed},
  author={Lin, Shanchuan and Liu, Bingchen and Li, Jiashi and Yang, Xiao},
  journal={arXiv preprint arXiv:2305.08891},
  year={2023}
}

Acknowledgement

This repo is done in collaboration with @carankt.

We borrow code from following repos:

Autoencoder: EnCodec
1D-UNet: audio-diffusion-pytorch
Utils and fixed diffusion for audio diffusion models: DPM-TSE

We use following tools:

Diffusion Schedulers are based on 🤗 Diffusers
DDP and AMP are built on 🚀Accelerate

WangHelin1997/Your-Stable-Audio

Your-Stable-Audio

TODO List

References

Acknowledgement