/LAFMA

Primary LanguagePython

LAFMA

Official implementation of the paper "LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation" (INTERSPEECH 2024). Paper Link and Demo Page .

Checkpoints

VAEGAN Model: The VAEGAN model is the audio VAE that compresses the audio mel-spectrogram into an audio latent.

LAFMA Model: The LAFAM model is the latent flow matching model for text guided audio generation model.

We use the checkpoint of HiFi-GAN vocoder provided by AudioLDM .

Inference

# install dependicies
pip install -r requirement.txt

# infer
(first download the huggingface flan-t5-large to the huggingface/flan-t5-large dir)
(replace the checkpoint_path to yours in the .sh file)
cd LAFMA 
sh egs/tta/audiolfm/run_inference.sh

Acknowledgements

Cites

@misc{guan2024lafma,
      title={LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation}, 
      author={Wenhao Guan and Kaidi Wang and Wangjin Zhou and Yang Wang and Feng Deng and Hui Wang and Lin Li and Qingyang Hong and Yong Qin},
      year={2024},
      eprint={2406.08203},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}