Browser-Bot

Browser-Bot is a powerful web application that leverages the Web Speech API to provide speech-to-text (STT) and text-to-speech (TTS) capabilities without the need for installing any additional apps or extensions. It is designed to be highly configurable, making it suitable for a wide range of applications and use cases. With additional features like GPT support, this app becomes a versatile and powerful tool for voice-based interaction on both Android and iOS smartphones, effectively turning older devices into AI assistants.

Features

  • Speech-to-Text (STT) support with hotword activation
  • Text-to-Speech (TTS) support
  • Configurable STT language, TTS voices, and parameters
  • GPT OpenAI API integration (optional) or your own REST API server
  • Export and import configuration settings
  • Log management and saving
  • Compatible with Android and iOS mobile browsers

Usage

  1. Open browser-bot.html in a modern web browser that supports the Web Speech API.
  2. Configure the app by clicking the [CONFIG] button and setting the desired options.
  3. Test the STT and TTS features by clicking the corresponding buttons.
  4. To use the GPT OpenAI API, enter your API key and other optional settings.

Configuration Options

  • REST URL: The REST API URL to send STT results.
  • STT Language: The language used by the STT service.
  • TTS Voice: The TTS voice settings (up to 4 different voices can be configured).
  • TTS Volume, Rate, and Pitch: Customize the TTS output properties.
  • STT Timeout: Maximum duration for STT listening.
  • STT HotWords, AnswerWords, ConfirmWords, and CancelWords: Configure specific words to trigger actions.
  • GPT OpenAI API Key, System Role, Token Limit, and Reply Language: Configure the GPT API integration (optional).

Speak multiple languages

To use the SPEAK button with multiple languages, you can create a text input that includes language codes ([en], [zh], [jp]) before each sentence or phrase. For example:

INPUT = '[en]Hello, this is an English sentence. [zh]你好,这是一句中文。[jp]こんにちは、これは日本語です。';

This input will generate speech in the following order:

  1. An English sentence: "Hello, this is an English sentence."
  2. A Chinese sentence: "你好,这是一句中文。"
  3. A Japanese sentence: "こんにちは、これは日本語です。"

Make sure to configure the appropriate TTS voices for each language using the Voice1, Voice2, Voice3, and Voice4 settings in the user interface.

Browser Compatibility

Browser-Bot relies on the Web Speech API, which is not supported by all browsers. Please use a modern browser like Google Chrome, Mozilla Firefox, or Microsoft Edge for the best experience.

REST API URL Format:

The REST API URL is formed by concatenating the REST URL configuration value with the query value.

To create a valid REST API URL, the REST URL value should be a properly formatted base URL (e.g., https://example.com/api/). The query value will be appended to this base URL, like https://example.com/api/{query}.

Response Format:

The expected response from the REST API should be a JSON object containing a message property, as shown below:

{
  "message": "Text to be spoken by the TTS system."
}

Contributing

If you have any suggestions, bug reports, or feature requests, feel free to open an issue or submit a pull request on the GitHub repository.

License

This project is licensed under the MIT License.