Explore using speech synthesis/recognition to prevent needing to use Narrator/Win+H
chrisglein opened this issue · 2 comments
Summary
Talking to an AI without having to type could be pretty magical.
Talking: Win+H for entering speech is painless, but most people don't know it exists.
Listening: Turning on Narrator just to read out responses is more than your average user wants.
There are APIs to do this directly. Try them?
https://learn.microsoft.com/en-us/uwp/api/windows.media.speechsynthesis.speechsynthesizer?view=winrt-22621
If those work well, can add buttons to 🎙️ and 👂/🔉 to the chat entry and chat responses respectively.
Had the AI take two cracks at doing this in C++/winrt. Here they are:
#include <winrt/Windows.Foundation.h>
#include <winrt/Windows.Media.SpeechSynthesis.h>
#include <winrt/Windows.Media.PlayTo.h>
#include <iostream>
namespace winrt = winrt::Windows;
using namespace winrt::Windows::Media::SpeechSynthesis;
using namespace winrt::Windows::Media::PlayTo;
int main()
{
// Create SpeechSynthesizer object
auto synthesizer = SpeechSynthesizer();
// Convert text to speech
auto stream = synthesizer.SynthesizeTextToStreamAsync(L"Hello, world!").get();
// Play audio
auto buffer = winrt::Windows::Storage::Streams::Buffer(stream.Size());
stream.ReadAsync(buffer, buffer.Capacity(), InputStreamOptions::None).get();
PlayToManager::ShowPlayToUI(std::nullptr_t());
winrt::com_ptr<PlayToSource> source;
PlayToManager::GetForCurrentView().DefaultSource(source);
source->Connection().WriteAsync(buffer).get();
// Wait for audio to finish playing
while (source->PlaybackRate() != 0.f) {}
return 0;
}
// Create an instance of the SpeechSynthesizer.
winrt::Windows::Media::SpeechSynthesis::SpeechSynthesizer synth;
// Retrieve the first female voice.
winrt::Windows::Media::SpeechSynthesis::VoiceInformation voiceInfo = (synth.AllVoices()).GetAt(0);
// Set the voice.
synth.Voice(voiceInfo);
// Generate the audio stream from plain text.
winrt::hstring text = L"Hello world";
winrt::Windows::Media::SpeechSynthesis::SpeechSynthesisStream stream = synth.SynthesizeTextToStream(text);
// Play the audio stream.
winrt::Windows::UI::Xaml::Controls::MediaElement mediaElement;
mediaElement.SetSource(stream, stream.ContentType());
mediaElement.Play();
Your mileage may vary.
Code that works:
// Initialize a new SpeechSynthesizer
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
// Set the text to be spoken
string text = "Hello World";
// Make the SpeechSynthesizer speak the text
var stream = await synth.SynthesizeTextToStreamAsync(text);
MediaPlayer player = new MediaPlayer();
player.Source = MediaSource.CreateFromStream(stream, stream.ContentType);
player.Play();
// player.Dispose();
}
Working to port this to react-native-winrt:
async function Speak() {
let synth = new Windows.Media.SpeechSynthesis.SpeechSynthesizer();
if (!synth) {
console.log("error creating SpeechSynthesizer");
return;
}
// Set the text to be spoken
let text = "Hello World";
// Make the SpeechSynthesizer speak the text
try {
let stream = await synth.synthesizeTextToStreamAsync(text);
// Play the audio stream
let player = new Windows.Media.Playback.MediaPlayer();
player.source = Windows.Media.Core.MediaSource.createFromStream(stream, stream.contentType);
player.play();
} catch (e) {
console.log(e);
}
}
This works... other than the fact that contentType isn't set. Workaround: hardcode to 'audio/wav'.