chrisglein/artificial-chat

Explore using speech synthesis/recognition to prevent needing to use Narrator/Win+H

chrisglein opened this issue · 2 comments

Summary

Talking to an AI without having to type could be pretty magical.
Talking: Win+H for entering speech is painless, but most people don't know it exists.
Listening: Turning on Narrator just to read out responses is more than your average user wants.

There are APIs to do this directly. Try them?
https://learn.microsoft.com/en-us/uwp/api/windows.media.speechsynthesis.speechsynthesizer?view=winrt-22621
If those work well, can add buttons to 🎙️ and 👂/🔉 to the chat entry and chat responses respectively.

Had the AI take two cracks at doing this in C++/winrt. Here they are:

#include <winrt/Windows.Foundation.h>
#include <winrt/Windows.Media.SpeechSynthesis.h>
#include <winrt/Windows.Media.PlayTo.h>
#include <iostream>

namespace winrt = winrt::Windows;
using namespace winrt::Windows::Media::SpeechSynthesis;
using namespace winrt::Windows::Media::PlayTo;

int main()
{
    // Create SpeechSynthesizer object
    auto synthesizer = SpeechSynthesizer();

    // Convert text to speech
    auto stream = synthesizer.SynthesizeTextToStreamAsync(L"Hello, world!").get();

    // Play audio
    auto buffer = winrt::Windows::Storage::Streams::Buffer(stream.Size());
    stream.ReadAsync(buffer, buffer.Capacity(), InputStreamOptions::None).get();

    PlayToManager::ShowPlayToUI(std::nullptr_t());

    winrt::com_ptr<PlayToSource> source;
    PlayToManager::GetForCurrentView().DefaultSource(source);

    source->Connection().WriteAsync(buffer).get();

    // Wait for audio to finish playing
    while (source->PlaybackRate() != 0.f) {}

    return 0;
}
// Create an instance of the SpeechSynthesizer.
winrt::Windows::Media::SpeechSynthesis::SpeechSynthesizer synth;

// Retrieve the first female voice.
winrt::Windows::Media::SpeechSynthesis::VoiceInformation voiceInfo = (synth.AllVoices()).GetAt(0);

// Set the voice.
synth.Voice(voiceInfo);

// Generate the audio stream from plain text.
winrt::hstring text = L"Hello world";
winrt::Windows::Media::SpeechSynthesis::SpeechSynthesisStream stream = synth.SynthesizeTextToStream(text);

// Play the audio stream.
winrt::Windows::UI::Xaml::Controls::MediaElement mediaElement;
mediaElement.SetSource(stream, stream.ContentType());
mediaElement.Play();

Your mileage may vary.

Code that works:

            // Initialize a new SpeechSynthesizer
            using (SpeechSynthesizer synth = new SpeechSynthesizer())
            {
                // Set the text to be spoken
                string text = "Hello World";

                // Make the SpeechSynthesizer speak the text
                var stream = await synth.SynthesizeTextToStreamAsync(text);

                MediaPlayer player = new MediaPlayer();
                player.Source = MediaSource.CreateFromStream(stream, stream.ContentType);
                player.Play();

                // player.Dispose();
            }

Working to port this to react-native-winrt:

async function Speak() {
  let synth = new Windows.Media.SpeechSynthesis.SpeechSynthesizer();
  if (!synth) {
    console.log("error creating SpeechSynthesizer");
    return;
  }
  
  // Set the text to be spoken
  let text = "Hello World";

  // Make the SpeechSynthesizer speak the text
  try {
    let stream = await synth.synthesizeTextToStreamAsync(text);

    // Play the audio stream
    let player = new Windows.Media.Playback.MediaPlayer();
    player.source = Windows.Media.Core.MediaSource.createFromStream(stream, stream.contentType);
    player.play();
  } catch (e) {
    console.log(e);
  }

}

This works... other than the fact that contentType isn't set. Workaround: hardcode to 'audio/wav'.
image