xtts-v2 error config.json not found

Question

xtts-v2 error config.json not found

FaintWhisper opened this issue 2 months ago · 5 comments

Hello,

First of all, thank you for your hard work, this tool has been incredibly useful and has saved me a lot of time with manual configurations. Keep up the great work!

I'm encountering an issue with the xtts-v2 setup. Here's what I've done so far:

Ran harbor down to remove all running instances and start fresh.
Executed harbor pull tts to pull the latest TTS image.
Ran harbor up tts to start the TTS container.
Accessed the interface via harbor open.

In the open-webui, I configured the system to use xtts-v2 by selecting tts-1-hd in the Admin Panel under the Audio tab. However, when I attempt to generate TTS responses, I receive a "Server connection error" message.

After checking the logs of the TTS container, I noticed the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/app/voices/tts/tts_models--multilingual--multi-dataset--xtts/config.json'

I navigated to the specified folder and confirmed that the config.json file is indeed missing. I'm unsure why this file is missing or how to resolve the issue, but I wanted to bring it to your attention in case this is a more widespread problem that others might be experiencing as well.

For reference, I found a potentially related issue in the Coqui repo (Issue #3064), but it doesn't seem useful for this case since the image pull process was never interrupted in my case.

Any guidance would be greatly appreciated.

Thank you!

Answer 1 · 2024-08-26T09:13:16.000Z

Hi, thanks for giving Harbor a spin!

There are two things to consider in this situation

Webui configuration

WebUI has a cache for the audio assets produced by TTS, so to ensure that openedai-speech is actually hit with the tts-1-hd request - you need to ensure that the test in question is not seen before. I usually regenerate a response to some short question to do that and click a "🔈" icon under the message.

Another aspect in play is persistence of the configuration with the current config merging setup. Unfortunately, some of the configuration you're making in the Open WebUI will be overwritten on the next harbor up depending on which services you run.

Initial download of the model

openedai-speech only starts downlaoding xtts-v2 upon first attempt of using tts-1-hd model. This is when the folder voices/tts/tts_models--multilingual--multi-dataset--xtts/ is created (download from HF Hub), it's ~2 Gb, so may take a while to download.

Here're sample logs from tts running without a local cache:

# Initial startup, xtts-v2 isn't downloaded yet
harbor.tts  | First startup may download 2GB of speech models. Please wait.
harbor.tts  | INFO:     Started server process [27]
harbor.tts  | INFO:     Waiting for application startup.
harbor.tts  | INFO:     Application startup complete.
harbor.tts  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
harbor.tts  | INFO:     192.168.64.3:40698 - "POST /v1/audio/speech HTTP/1.1" 200 OK

# 1. Configure Open WebUI to use tts-1-hd
# 2. Generate speech from some uncached text
harbor.tts  | 2024-08-26 08:51:44.737 | INFO     | __main__:__init__:59 - Loading model xtts to cuda

# Takes some time to download the model

# Check the folder size to see download progress
# voices/tts/tts_models--multilingual--multi-dataset--xtts
du -h $(harbor home)/tts

# Sample output when download is complete
 user@os:~/code/harbor$ ▼ du -h $(harbor home)/tts
12K	/home/user/code/harbor/tts/config
1.8G	/home/user/code/harbor/tts/voices/tts/tts_models--multilingual--multi-dataset--xtts
1.8G	/home/user/code/harbor/tts/voices/tts
1.9G	/home/user/code/harbor/tts/voices
1.9G	/home/user/code/harbor/tts

After this, you should see xtts-v2 being used for TTS.

Now, on the next start, Harbor will re-apply default tts configuration that doesn't use this model. This is a limitation of the current uni-directional config flow that I'm not sure how to resolve in a nice way yet. There's a built-in workaround, however, a special config.override.json that'll override anything else set by Harbor.

In order to permanently apply the config override:

# This is the path to the final webui config file
# Open in the editor of your choice
echo $(harbor home)/open-webui/config.json

Grab a portion related to the tts:

{
	"audio": {
		"tts": {
			"openai": {
				"api_base_url": "http://tts:8000/v1",
				"api_key": "sk-dummy-key"
			},
			"engine": "openai",
			"model": "tts-1-hd",
			"voice": "alloy",
			"api_key": ""
		},
		"stt": {
			"engine": "",
			"model": "whisper-1"
		}
	}
}

Paste this portion into the override config, which is located at this path:

echo $(harbor home)/open-webui/configs/config.override.json

If you're using VS Code and have its CLI available globally, you can quickly open Harbor home for such edits with:

harbor vscode

Answer 2 · 2024-08-26T09:44:08.000Z

Added these to the wiki:

Answer 3 · 2024-08-26T21:20:26.000Z

Thank you for your response!

I noticed that the previously prompted responses for TTS were cached and not regenerated with the new TTS settings. However, despite generating new messages and prompting for TTS after updating the settings to configure xtts-v2, I still encountered the error I mentioned in my initial response. An error message appears with the message "Server Connection Error" and in the logs of the tts service (harbor logs tts) the following error message is printed:

FileNotFoundError: [Errno 2] No such file or directory: '/app/voices/tts/tts_models--multilingual--multi-dataset--xtts/config.json'

To investigate further, I checked the logs of the opendai-speech container and discovered that a config.json file was missing. After spending some time troubleshooting the issue—such as deleting the container and image without success—I decided to delete the voices/tts/tts_models--multilingual--multi-dataset--xtts/ folder. I then executed the script to force a redownload of the model by running:

harbor shell tts

bash download_voices_tts-1-hd.sh xtts_v2.0.2 # Following installations instructions at https://github.com/matatonic/openedai-speech/pkgs/container/openedai-speech#installation-instructions

Afterward, I listed the contents of the new directory with:

ls voices/tts/tts_models--multilingual--multi-dataset--xtts_v2.0.2/

This confirmed that all required files were present: config.json, hash.md5, model.pth, speakers_xtts.pth, and vocab.json.

I also observed that the original folder was voices/tts/tts_models--multilingual--multi-dataset--xtts, while the new one was voices/tts/tts_models--multilingual--multi-dataset--xtts_v2.0.2. I'm unsure if the system was pulling a different model that might not be compatible with the way it's being executed (?).

Finally, I restarted the composed service using harbor down followed by harbor up tts. After configuring the TTS settings for xtts-v2 and generating new messages, I prompted for TTS again, and this time it worked perfectly, albeit with a significant initial delay.

I’m wondering if there might be a specific issue with the opendai-speech package, possibly related to how it handles the configuration file or pulls the model. However, based on your logs, it seems to be working correctly on your end, so I’m not sure if this issue is unique to my setup.

I performed a fresh install of the TTS compose service and haven’t made significant changes to its configuration or to the settings of harbor (aside from briefly trying Parler before).

On the other hand, I did notice that the settings were resetting each time the service resets, and I was just about to open another issue for that 😅. So, thank you for the instructions! I followed your steps to override the default configuration and set the TTS model to xtts-v2, and it worked flawlessly. I also took the opportunity to update the OLLAMA_HOST endpoint, pointing it to another self-managed server outside of Harbor and it works perfectly.

Answer 4 · 2024-08-27T08:17:12.000Z

I’m wondering if there might be a specific issue with the opendai-speech package, possibly related to how it handles the configuration file or pulls the model. However, based on your logs, it seems to be working correctly on your end, so I’m not sure if this issue is unique to my setup.

My theory is that during the first start tts tried to download the default xtts-v2 at some point, but it took a while and you might've assumed that the system is not responding as it should and performed a shutdown or a restart. That left a cache folder without any actual files in the filesystem, so that in the following attempts model was assumed to be present and the download not resumed.

When you deleted the folder and restarted the service, I assume it re-downloaded everything from scratch as initially expected.

You can still use v2.0.2 (or any other version of xtts, for that matter) by adjusting the voice_to_speaker.yaml and specfying custom model key:

tts-1-hd:
  alloy:
    model: xtts_v2.0.2
    speaker: voices/alloy.wav

I didn't test this specifically, but I'm assuming it should work as per one of the openedai-speech examples.

I followed your steps to override the default configuration and set the TTS model to xtts-v2, and it worked flawlessly. I also took the opportunity to update the OLLAMA_HOST endpoint

Nice! Kudos on customizing your setup! You can also add arbitrary OpenAI-compatible APIs to Harbor setup via:

harbor openai urls
harbor openai keys

# Add a custom Ollama endpoint
harbor openai urls add http://localhost:11434/v1
harbor openai keys add sk-ollama2

These will be added to Open WebUI automatically. This way you can add more than one Ollama instances and fully remote APIs to your setup. The caveat is that you won't be able to manage those Ollamas via WebUI itself.

Answer 5 · 2024-08-27T14:49:16.000Z

That explanation makes sense. I realize now that the openai-speech image pulls the model during container creation, and it’s saved within a volume (ref). It’s possible that because I didn’t delete the volume, the model didn’t download again, as the folder was already there, though incomplete. What I still don’t understand is how the download was canceled, as I didn’t explicitly interrupt the process at any point.

I’ll close this issue now since we have a reasonable explanation and a workaround for others who might face the same problem.

Thank you for your help!