kamushadenes/chloe

[FEATURE] Text-to-Speech Action

kamushadenes opened this issue · 0 comments

Overview

Implement an action, ReAct: Text-to-Speech, that allows users to generate audio files from text using the Google Cloud's Text-to-Speech API.

Motivation

Objective

Provide users with the ability to generate audio files from text within the AI assistant, enhancing their experience and supporting various text-to-speech tasks.

Impact

The Text-to-Speech action will enable users to quickly convert text into audio files, potentially improving their productivity and overall satisfaction.

Proposed Solution

Description

Create an action, ReAct: Text-to-Speech, that takes text as input, generates an audio file using the Google Cloud's Text-to-Speech API with configurable voice settings, and returns the audio file to the user.

Changes
  • Integrate the Google Cloud's Text-to-Speech API for audio generation.
  • Design and develop the ReAct: Text-to-Speech action that processes text, generates audio files, and returns the audio files to users.
  • Implement configurable settings for language, voice, audio format, speaking rate, pitch, and volume gain.
  • Integrate the ReAct: Text-to-Speech action into the existing AI assistant framework.
  • Test the ReAct: Text-to-Speech action to ensure accurate audio generation and user-friendly output.

Configuration Options

  • GOOGLE_APPLICATION_CREDENTIALS: Google Cloud credentials file
  • CHLOE_TTS_LANGUAGE_CODE: Language code for the TTS engine
  • CHLOE_TTS_VOICE_NAME: Voice name for the TTS engine
  • CHLOE_TTS_AUDIO_ENCODING: Audio format
  • CHLOE_TTS_SPEAKING_RATE: Speaking rate for the TTS engine
  • CHLOE_TTS_PITCH: Pitch for the TTS engine
  • CHLOE_TTS_VOLUME_GAIN_DB: Volume gain for the TTS engine in DB

Considerations (Optional)

  • Assess the performance impact of the ReAct: Text-to-Speech action on the AI assistant.
  • Evaluate the costs and usage limits associated with using the Google Cloud's Text-to-Speech API for audio generation.
  • Consider the need for additional documentation or user guidance on configuring and using the ReAct: Text-to-Speech action.

Additional Resources