A text-to-speech synthesis system based on DMOSpeech2 with a user-friendly Gradio interface.
Note: You'll need to download the model checkpoints separately (see instructions below).
- Zero-shot voice cloning from reference audio
- High-quality speech synthesis with metric optimization
- Easy-to-use Gradio interface
- Configurable generation parameters
- Support for both CPU and GPU inference
The fastest way to get started:
-
Download and setup:
# Create virtual environment first python -m venv dmo2 # Activate environment # Linux/macOS: source dmo2/bin/activate # Windows: dmo2\Scripts\activate # Clone and enter project git clone https://github.com/PierrunoYT/DMOSpeech2Speech-Local.git cd DMOSpeech2Speech-Local # IMPORTANT: Install PyTorch first (required for the interface) pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Then install other requirements pip install -r DMOSpeech2/requirements.txt pip install ipython
-
Download model checkpoints:
mkdir ckpts cd ckpts # Download from Huggingface # Linux/macOS: wget https://huggingface.co/yl4579/DMOSpeech2/resolve/main/model_85000.pt wget https://huggingface.co/yl4579/DMOSpeech2/resolve/main/model_1500.pt # Windows (PowerShell): # Invoke-WebRequest -Uri "https://huggingface.co/yl4579/DMOSpeech2/resolve/main/model_85000.pt" -OutFile "model_85000.pt" # Invoke-WebRequest -Uri "https://huggingface.co/yl4579/DMOSpeech2/resolve/main/model_1500.pt" -OutFile "model_1500.pt" cd ..
-
Launch interface:
python run_tts.py
-
Open your browser and start generating speech!
For more advanced usage and inference examples, see demo.ipynb
Make sure you have Python 3.10+ installed on your system. You can download it from python.org.
Recommended: Python 3.10
All platforms:
python -m venv dmo2Activate the environment:
Linux/macOS:
source dmo2/bin/activateWindows (Command Prompt):
dmo2\Scripts\activateWindows (PowerShell):
dmo2\Scripts\Activate.ps1Note: You'll need to activate this environment every time you want to use DMOSpeech2Speech.
- Install PyTorch with CUDA support:
For CUDA 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118For CUDA 12.1:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121For CPU only:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpuCheck your CUDA version:
nvidia-smi- Install python requirements:
pip install -r DMOSpeech2/requirements.txt- Additional dependencies:
pip install ipython gradio>=3.45.2Alternative: You can also create an F5-TTS environment and directly run the inference with it.
Important: Model checkpoints are NOT included in this repository due to their large size. You must download them:
# Only run if ckpts folder doesn't exist or is empty
mkdir ckpts
cd ckpts
wget https://huggingface.co/yl4579/DMOSpeech2/resolve/main/model_85000.pt
wget https://huggingface.co/yl4579/DMOSpeech2/resolve/main/model_1500.pt
cd ..Model descriptions:
model_85000.pt- DMOSpeech checkpoint (including teacher for teacher-guided sampling)model_1500.pt- GRPO-finetuned duration predictor checkpoint
Expected folder structure after setup:
DMOSpeech2Speech-Local/
├── README.md
├── run_tts.py
├── dmo_tts_interface.py
├── LICENSE
├── ckpts/ (you need to create and populate this)
│ ├── model_85000.pt (download required)
│ └── model_1500.pt (download required)
├── DMOSpeech2/
│ ├── requirements.txt
│ ├── LICENSE
│ ├── README.md
│ └── src/
├── dmo2/ (virtual environment - create this)
└── .gitignore
Option 1: Easy launcher (Recommended)
python run_tts.pyOption 2: Direct launch
python dmo_tts_interface.pyThe interface will open in your browser at localhost:7861
"ModuleNotFoundError: No module named 'torch'" Error:
- First, verify your environment is activated:
- Linux/macOS:
source dmo2/bin/activate - Windows:
dmo2\Scripts\activate
- Linux/macOS:
- Check if PyTorch is actually installed:
pip list | grep torch(Linux/Mac) orpip list | findstr torch(Windows) - Test PyTorch import directly:
python -c "import torch; print(torch.__version__)" - If PyTorch is installed but still getting import error:
- Check which Python you're using:
which python(Linux/Mac) orwhere python(Windows) - Make sure you're using the virtual environment Python
- Try running with explicit Python:
python.exe dmo_tts_interface.py(Windows) - Restart your terminal/command prompt and reactivate environment
- Check which Python you're using:
- If PyTorch is not installed:
- For CUDA 12.1:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 - For CUDA 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 - For CPU-only:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
- For CUDA 12.1:
- Environment conflicts:
- Deactivate and reactivate environment:
deactivatethen reactivate - Try creating a fresh environment if issues persist
- Deactivate and reactivate environment:
Interface won't start:
- Make sure you're in the project root directory (where
README.mdis located) - Check that you have downloaded the model files to the
ckpts/folder - Verify all dependencies are installed:
pip install -r DMOSpeech2/requirements.txt - Install additional dependencies:
pip install ipython gradio>=3.45.2
Model loading errors:
- Ensure you have downloaded the checkpoint files to the
ckpts/directory - Check that both
model_85000.ptandmodel_1500.ptare present - The interface supports both EMA and non-EMA model formats
Import errors:
- Run from the project root directory:
cd /path/to/DMOSpeech2Speech-Local - Make sure the
DMOSpeech2/src/directory exists and contains the source files
Audio generation issues:
- Provide clear reference audio (3-10 seconds recommended)
- Ensure reference text matches the reference audio content
- Try adjusting generation parameters (guidance scale, steps)
- Check that your GPU has sufficient memory
Port conflicts:
- Interface runs on port 7861
- If port is busy, modify the port number in
dmo_tts_interface.py - Use
netstat -an | grep :7861to check port availability
If you encounter import errors:
- Make sure you have all Gradio dependencies:
pip install gradio>=3.45.2 - Ensure you're running from the project root directory
- Check Python path issues by running:
python -c "import sys; print(sys.path)"
For Windows users:
- If activation doesn't work in PowerShell, try Command Prompt instead
- Make sure you have execution policies set correctly for PowerShell:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser - Use forward slashes or escape backslashes in paths when needed
This project is licensed under the MIT License - see the LICENSE file for details.
Based on the original DMOSpeech2 research by Yinghao Aaron Li et al.
Contributions are welcome! Please feel free to submit a Pull Request.