/Your-Stable-Audio

Stable Audio UnOffical Implementation: Latent Diffusion for Audio Generation

Primary LanguagePython

Your-Stable-Audio

UnOfficial PyTorch implementation of Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion

Your-Stable-Audio (💻WIP)

TODO List

  • Classifier-free diffusion guidance
  • Fixed diffusion: Common Diffusion Noise Schedules and Sample Steps are Flawed
  • Add configs and training examples
  • Upload model weights and demos
  • Update evaluation metric
  • Add Timing Embeddings proposed by Stable Audio
  • Support other tasks: Sound Extraction, Editing, Inpainting, Super-Resolution, etc.

References

If you find the code useful for your research, please consider citing:

@article{hai2023dpm,
  title={DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction},
  author={Hai, Jiarui and Wang, Helin and Yang, Dongchao and Thakkar, Karan and Dehak, Najim and Elhilali, Mounya},
  journal={arXiv preprint arXiv:2310.04567},
  year={2023}
}

This repo is inspired by:

@misc{Stability2023stable,
  title = {Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion},
  howpublished = {https://stability.ai/research/stable-audio-efficient-timing-latent-diffusion},
  year = {2023},
}
@article{defossez2022high,
  title={High fidelity neural audio compression},
  author={Défossez, Alexandre and Copet, Jade and Synnaeve, Gabriel and Adi, Yossi},
  journal={arXiv preprint arXiv:2210.13438},
  year={2022}
}
@inproceedings{rombach2022high,
  title={High-resolution image synthesis with latent diffusion models},
  author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Björn Ommer},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={10684--10695},
  year={2022}
}
@article{ghosal2023tango,
  title={Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model},
  author={Ghosal, Deepanway and Majumder, Navonil and Mehrish, Ambuj and Poria, Soujanya},
  journal={arXiv preprint arXiv:2304.13731},
  year={2023}
}
@article{lin2023common,
  title={Common Diffusion Noise Schedules and Sample Steps are Flawed},
  author={Lin, Shanchuan and Liu, Bingchen and Li, Jiashi and Yang, Xiao},
  journal={arXiv preprint arXiv:2305.08891},
  year={2023}
}

Acknowledgement

This repo is done in collaboration with @carankt.

We borrow code from following repos:

We use following tools: