TTS Response is clipped at the beginning
Opened this issue · 5 comments
mdvickst commented
I've got Wyoming Satellite running on an Ubuntu VM (Proxmox) with a USB speakerphone connected for mic/speaker and when it plays back the TTS Response the first 1-2 seconds is cutoff. Awake and Done wav sounds work as expected.
Satellite Service:
[Unit]
Description=Wyoming Satellite
After=multi-user.target
[Service]
WorkingDirectory=/home/satellite/wyoming-satellite
ExecStart=/usr/bin/env python3 script/run --name 'my satellite' --uri 'tcp://0.0.0.0:10700' --mic-command 'arecord -r 16000 -c 1 -f S16_LE -t raw' --snd-command 'aplay -r 22050 -c 1 -f S16_LE -t raw' --wake-uri 'tcp://127.0.0.1:10400' --wake-word-name 'hey_jarvis' --done-wav 'awake.wav'
Type=simple
Restart=always
RestartSec=1
[Install]
WantedBy=multi-user.target
Local Wake word service:
[Unit]
Description=Start OpenWakeWord Service
After=multi-user.target
[Service]
WorkingDirectory=/home/satellite/wyoming-openwakeword
ExecStart=/usr/bin/env python3 script/run --uri 'tcp://0.0.0.0:10400' --preload-model 'hey_jarvis' --threshold .99
Type=simple
[Install]
WantedBy=multi-user.target
mdvickst commented
Here's a sample where the response was a simple "done" and nothing was played.
stage: done
run:
pipeline: 01gznrs9cwqteanxeccwr64hev
language: en
events:
- type: run-start
data:
pipeline: 01gznrs9cwqteanxeccwr64hev
language: en
timestamp: "2024-02-22T13:30:54.760654+00:00"
- type: stt-start
data:
engine: stt.home_assistant_cloud
metadata:
language: en-US
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
timestamp: "2024-02-22T13:30:54.760748+00:00"
- type: stt-vad-start
data:
timestamp: 325
timestamp: "2024-02-22T13:30:55.459661+00:00"
- type: stt-vad-end
data:
timestamp: 1485
timestamp: "2024-02-22T13:30:57.765833+00:00"
- type: stt-end
data:
stt_output:
text: Raise Girls Room shade.
timestamp: "2024-02-22T13:30:57.926834+00:00"
- type: intent-start
data:
engine: homeassistant
language: en
intent_input: Raise Girls Room shade.
conversation_id: null
device_id: 42a86d70378853b7a345e4b8bd136800
timestamp: "2024-02-22T13:30:57.926956+00:00"
- type: intent-end
data:
intent_output:
response:
speech:
plain:
speech: Opened
extra_data: null
card: {}
language: en
response_type: action_done
data:
targets: []
success:
- name: Girls Room Shade
type: entity
id: cover.girls_room_shade
failed: []
conversation_id: null
timestamp: "2024-02-22T13:30:57.952652+00:00"
- type: tts-start
data:
engine: tts.home_assistant_cloud
language: en-GB
voice: EthanNeural
tts_input: Opened
timestamp: "2024-02-22T13:30:57.952700+00:00"
- type: tts-end
data:
tts_output:
media_id: >-
media-source://tts/tts.home_assistant_cloud?message=Opened&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
url: >-
/api/tts_proxy/c4f1f5b1d49f90d5437402166829d6b471bf1593_en-gb_35edc9ddc9_tts.home_assistant_cloud.wav
mime_type: audio/x-wav
timestamp: "2024-02-22T13:30:57.953188+00:00"
- type: run-end
data: null
timestamp: "2024-02-22T13:30:57.953247+00:00"
stt:
engine: stt.home_assistant_cloud
metadata:
language: en-US
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
done: true
stt_output:
text: Raise Girls Room shade.
intent:
engine: homeassistant
language: en
intent_input: Raise Girls Room shade.
conversation_id: null
device_id: 42a86d70378853b7a345e4b8bd136800
done: true
intent_output:
response:
speech:
plain:
speech: Opened
extra_data: null
card: {}
language: en
response_type: action_done
data:
targets: []
success:
- name: Girls Room Shade
type: entity
id: cover.girls_room_shade
failed: []
conversation_id: null
tts:
engine: tts.home_assistant_cloud
language: en-GB
voice: EthanNeural
tts_input: Opened
done: true
tts_output:
media_id: >-
media-source://tts/tts.home_assistant_cloud?message=Opened&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
url: >-
/api/tts_proxy/c4f1f5b1d49f90d5437402166829d6b471bf1593_en-gb_35edc9ddc9_tts.home_assistant_cloud.wav
mime_type: audio/x-wav
mdvickst commented
And here is another with a longer response where I just heard "rned off the lights"
stage: done
run:
pipeline: 01gznrs9cwqteanxeccwr64hev
language: en
events:
- type: run-start
data:
pipeline: 01gznrs9cwqteanxeccwr64hev
language: en
timestamp: "2024-02-22T13:33:42.117326+00:00"
- type: stt-start
data:
engine: stt.home_assistant_cloud
metadata:
language: en-US
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
timestamp: "2024-02-22T13:33:42.117494+00:00"
- type: stt-vad-start
data:
timestamp: 275
timestamp: "2024-02-22T13:33:42.688250+00:00"
- type: stt-vad-end
data:
timestamp: 1125
timestamp: "2024-02-22T13:33:44.417176+00:00"
- type: stt-end
data:
stt_output:
text: Turn off living room lights.
timestamp: "2024-02-22T13:33:44.591799+00:00"
- type: intent-start
data:
engine: homeassistant
language: en
intent_input: Turn off living room lights.
conversation_id: null
device_id: 42a86d70378853b7a345e4b8bd136800
timestamp: "2024-02-22T13:33:44.591861+00:00"
- type: intent-end
data:
intent_output:
response:
speech:
plain:
speech: Turned off the lights
extra_data: null
card: {}
language: en
response_type: action_done
data:
targets: []
success:
- name: Living Room
type: area
id: 86726e558f304c699f0015d0f229a901
- name: Living Room Can Lights Basic
type: entity
id: light.living_room_can_lights_basic
- name: "Living Room Can Lights "
type: entity
id: light.living_room_can_lights
failed: []
conversation_id: null
timestamp: "2024-02-22T13:33:44.736401+00:00"
- type: tts-start
data:
engine: cloud
language: en-GB
voice: EthanNeural
tts_input: Turned off the lights
timestamp: "2024-02-22T13:33:44.736437+00:00"
- type: tts-end
data:
tts_output:
media_id: >-
media-source://tts/cloud?message=Turned+off+the+lights&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
url: >-
/api/tts_proxy/85d43b448ab715eae17c0361864a34ff749eb14a_en-gb_35edc9ddc9_cloud.wav
mime_type: audio/x-wav
timestamp: "2024-02-22T13:33:44.736757+00:00"
- type: run-end
data: null
timestamp: "2024-02-22T13:33:44.736789+00:00"
stt:
engine: stt.home_assistant_cloud
metadata:
language: en-US
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
done: true
stt_output:
text: Turn off living room lights.
intent:
engine: homeassistant
language: en
intent_input: Turn off living room lights.
conversation_id: null
device_id: 42a86d70378853b7a345e4b8bd136800
done: true
intent_output:
response:
speech:
plain:
speech: Turned off the lights
extra_data: null
card: {}
language: en
response_type: action_done
data:
targets: []
success:
- name: Living Room
type: area
id: 86726e558f304c699f0015d0f229a901
- name: Living Room Can Lights Basic
type: entity
id: light.living_room_can_lights_basic
- name: "Living Room Can Lights "
type: entity
id: light.living_room_can_lights
failed: []
conversation_id: null
tts:
engine: cloud
language: en-GB
voice: EthanNeural
tts_input: Turned off the lights
done: true
tts_output:
media_id: >-
media-source://tts/cloud?message=Turned+off+the+lights&language=en-GB&voice=EthanNeural&preferred_format=wav&preferred_sample_rate=16000&preferred_sample_channels=1
url: >-
/api/tts_proxy/85d43b448ab715eae17c0361864a34ff749eb14a_en-gb_35edc9ddc9_cloud.wav
mime_type: audio/x-wav
khalob commented
Try looking if lowering/toggling off this setting helps you:
#121
I had a similar issue
motoridersd commented
Considering doing this in a Proxmox box. Were you able to resolve the issue? Has it been working well for you?