The Advanced TTS Service is a robust, Node.js-based solution designed to convert textual content into lifelike speech. Leveraging Azure Cognitive Services Speech SDK, this service introduces an innovative approach to processing and synthesizing speech, equipped with features like clipboard monitoring, HTTP server integration, markdown preprocessing, and XML character escaping. It's engineered for extensibility, maintainability, and seamless Azure integration, catering to a wide range of text-to-speech conversion needs.
- Clipboard Monitoring: Employs a sophisticated listener to detect and process text copied to the clipboard, prefixed with a customizable trigger word, facilitating immediate TTS conversion.
- HTTP Server Integration: Features an HTTP server capable of accepting text input via POST requests, enabling programmatic text-to-speech conversion through web services.
- Markdown Preprocessing: Incorporates a markdown preprocessor that converts markdown-formatted text into plain text, optimizing it for speech synthesis.
- XML Character Escaping: Utilizes an XML escaper to ensure text safety for XML/SSML processing, thereby enhancing the versatility and reliability of speech synthesis.
- Queue Management: Implements a TTS queue management system to manage and sequence text inputs for processing, ensuring orderly speech synthesis.
- Azure Cognitive Speech Service Integration: Seamlessly integrates with Azure's TTS service, supporting advanced SSML generation for enriched speech synthesis experiences.
- Audio Playback: Facilitates the playback of synthesized speech audio streams through system speakers, providing an immediate auditory output of the processed text.
.
├── README.md
├── app.js
├── config
│ └── voiceSettings.json
├── package.json
├── pnpm-lock.yaml
└── src
├── controllers
│ └── ttsController.js
├── listeners
│ └── clipboardListener.js
├── preprocessors
│ ├── markdownPreprocessor.js
│ └── xmlEscaper.js
├── services
│ ├── textExtractor.js
│ ├── ttsQueue.js
│ └── ttsServiceAzureAI.js
└── utils
├── logger.js
├── notifier.js
├── playAudioStream.js
└── ssmlGenerator.js
- Node.js (v18.5 or newer recommended)
- An Azure account with an active Cognitive Services Speech subscription
-
Clone the Repository
git clone https://github.com/TheGreenJosip/TTS-package cd TTS-package
-
Install Dependencies
Using pnpm:
pnpm install
-
Configure Environment Variables
Populate a
.env
file in the project root with your Azure subscription key, region, and other configurations:SPEECH_KEY=your_subscription_key_here SPEECH_REGION=your_region_here TRIGGER_WORD=your_trigger_word PORT=your_api_port
Execute the following command to start the TTS service:
node app.js
This initiates the clipboard monitoring and HTTP server, ready to process text for speech synthesis.
To convert text to speech via HTTP, send a POST request:
curl -X POST http://localhost:PORT/tts -H "Content-Type: application/json" -d "{\"text\":\"Hello, world!\"}"
Copy any text prefixed with the trigger word (default: "TTS") to the clipboard. The service will automatically detect, process, and convert the text to speech.
The config/voiceSettings.json
file allows for detailed customization of voice and speech patterns. Adjust settings here to tailor the TTS output to your preferences.
Contributions are welcome! Please refer to the contributing guidelines for more details on how to participate in the project's development.
To run the service in the background, consider using pm2
:
pm2 start app.js --name tts-service
Manage the service with pm2 stop tts-service
and pm2 start tts-service
.
This project is licensed under the MIT License. See the LICENSE file for details.
- Sophisticated Introduction: The introduction now clearly outlines the service's capabilities and its integration with Azure Cognitive Services, setting a professional tone.
- Detailed Feature Descriptions: Each feature is described in detail, highlighting the service's functionality and technical sophistication.
- Comprehensive Project Structure: The updated project structure reflects the latest changes, providing clarity on the organization and modularity of the codebase.
- Streamlined Getting Started Section: The installation and usage instructions are concise, making it easy for users to get the service up and running.
- Advanced Configuration: A brief mention of advanced configuration options encourages users to explore and customize the service further.
- Professional Tone: Throughout the document, the language and structure aim to communicate a high level of professionalism and attention to detail, targeting an audience of senior developers and technical users.