Explore using speech synthesis/recognition to prevent needing to use Narrator/Win+H
Talking to an AI without having to type could be pretty magical.
Talking: Win+H for entering speech is painless, but most people don't know it exists.
Listening: Turning on Narrator just to read out responses is more than your average user wants.
There are APIs to do this directly. Try them?
If those work well, can add buttons to 🎙️ and 👂/🔉 to the chat entry and chat responses respectively.
Had the AI take two cracks at doing this in C++/winrt. Here they are:
#include <winrt/Windows.Foundation.h>
#include <winrt/Windows.Media.SpeechSynthesis.h>
#include <winrt/Windows.Media.PlayTo.h>
#include <iostream>
namespace winrt = winrt::Windows;
using namespace winrt::Windows::Media::SpeechSynthesis;
using namespace winrt::Windows::Media::PlayTo;
int main()
// Create SpeechSynthesizer object
auto synthesizer = SpeechSynthesizer();
// Convert text to speech
auto stream = synthesizer.SynthesizeTextToStreamAsync(L"Hello, world!").get();
// Play audio
auto buffer = winrt::Windows::Storage::Streams::Buffer(stream.Size());
stream.ReadAsync(buffer, buffer.Capacity(), InputStreamOptions::None).get();
winrt::com_ptr<PlayToSource> source;
// Wait for audio to finish playing
while (source->PlaybackRate() != 0.f) {}
return 0;
// Create an instance of the SpeechSynthesizer.
winrt::Windows::Media::SpeechSynthesis::SpeechSynthesizer synth;
// Retrieve the first female voice.
winrt::Windows::Media::SpeechSynthesis::VoiceInformation voiceInfo = (synth.AllVoices()).GetAt(0);
// Set the voice.
// Generate the audio stream from plain text.
winrt::hstring text = L"Hello world";
winrt::Windows::Media::SpeechSynthesis::SpeechSynthesisStream stream = synth.SynthesizeTextToStream(text);
// Play the audio stream.
winrt::Windows::UI::Xaml::Controls::MediaElement mediaElement;
mediaElement.SetSource(stream, stream.ContentType());
Your mileage may vary.
Code that works:
// Initialize a new SpeechSynthesizer
using (SpeechSynthesizer synth = new SpeechSynthesizer())
// Set the text to be spoken
string text = "Hello World";
// Make the SpeechSynthesizer speak the text
var stream = await synth.SynthesizeTextToStreamAsync(text);
MediaPlayer player = new MediaPlayer();
player.Source = MediaSource.CreateFromStream(stream, stream.ContentType);
// player.Dispose();
Working to port this to react-native-winrt:
async function Speak() {
let synth = new Windows.Media.SpeechSynthesis.SpeechSynthesizer();
if (!synth) {
console.log("error creating SpeechSynthesizer");
// Set the text to be spoken
let text = "Hello World";
// Make the SpeechSynthesizer speak the text
try {
let stream = await synth.synthesizeTextToStreamAsync(text);
// Play the audio stream
let player = new Windows.Media.Playback.MediaPlayer();
player.source = Windows.Media.Core.MediaSource.createFromStream(stream, stream.contentType);
} catch (e) {
This works... other than the fact that contentType isn't set. Workaround: hardcode to 'audio/wav'.