Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
Style Transfer | GenerSpeech | Yes |
Speech Recognition | whisper, Conformer | Yes |
Speech Enhancement | ConvTasNet | Yes (WIP) |
Speech Separation | TF-GridNet | Yes (WIP) |
Speech Translation | Multi-decoder | WIP |
Mono-to-Binaural | NeuralWarp | Yes |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Audio | Make-An-Audio | Yes |
Audio Inpainting | Make-An-Audio | Yes |
Image-to-Audio | Make-An-Audio | Yes |
Sound Detection | Audio-transformer | Yes |
Target Sound Detection | TSDNet | Yes |
Sound Extraction | LASSNet | Yes |
Task | Supported Foundation Models | Status |
---|---|---|
Talking Head Synthesis | GeneFace | Yes (WIP) |
We appreciate the open source of the following projects:
ESPNet NATSpeech Visual ChatGPT Hugging Face LangChain Stable Diffusion