Talkify: Swift Speech Recognition and Synthesis Library for iOS & macOS

Talkify is a Swift library designed to streamline the process of integrating speech recognition and synthesis capabilities into iOS and macOS applications. The library harnesses the power of native APIs such as SFSpeechRecognizer and AVSpeechSynthesizer, providing a high-level interface that simplifies their usage and handles common tasks, such as managing audio sessions and checking microphone permissions.

The primary component is the Talkify class. This class provides a comprehensive set of methods for managing speech recognition tasks. It establishes and manages an AVAudioEngine instance for audio operations, handles speech recognition requests and tasks, and provides delegate methods to keep your application informed about the status of speech recognition processes. It also integrates with TalkifyRecordingSession to facilitate the audio recording process.

Requirements

Swift 5.0 or higher
iOS 13.0 or higher
macOS 10.15 or higher
SPM

Supported Languages

For text to speech and as well for speech to text

Language	Flag
English (Australia)	🇦🇺
English (United Kingdom)	🇬🇧
English (United States)	🇺🇸
English (Ireland)	🇮🇪
English (South Africa)	🇿🇦
中文(**)	🇨🇳
中文(香港)	🇭🇰
中文(台灣)	🇹🇼
Nederlands (België)	🇧🇪
Nederlands (Nederland)	🇳🇱
Français (Canada)	🇨🇦
Français (France)	🇫🇷
Deutsch (Deutschland)	🇩🇪
Deutsch (Österreich)	🇦🇹
Deutsch (Schweiz)	🇨🇭
Italiano (Italia)	🇮🇹
日本語 (日本)	🇯🇵
한국어 (대한민국)	🇰🇷
Norsk (Norge)	🇳🇴
Polski (Polska)	🇵🇱
Português (Brasil)	🇧🇷
Português (Portugal)	🇵🇹
Română (România)	🇷🇴
Русский (Россия)	🇷🇺
Slovenčina (Slovenská republika)	🇸🇰
Español (Argentina)	🇦🇷
Español (México)	🇲🇽
Español (España)	🇪🇸
Español (Estados Unidos)	🇺🇸
Svenska (Sverige)	🇸🇪
ไทย (ประเทศไทย)	🇹🇭
Türkçe (Türkiye)	🇹🇷

Supported Voices

Language	Voices
Arabic	Maged
Bulgarian	Daria
Catalan	Montserrat
Czech	Zuzana
Danish	Sara
German	Anna
Greek	Melina
Australian English	Karen
British English	Daniel
Irish English	Moira
Indian English	Rishi
US English	Samantha, Whisper, Princess, Bells, Organ, BadNews, Bubbles, Junior, Bahh, Deranged, Boing, GoodNews, Zarvox, Ralph, Cellos, Kathy, Fred
South African English	Tessa, Trinoids, Albert, Hysterical
Spanish	Monica (Neutral), Paulina (Mexican)
Finnish	Satu
French	Amelie (Canadian), Thomas
Hebrew	Carmit
Hindi	Lekha
Croatian	Lana
Hungarian	Mariska
Indonesian	Damayanti
Italian	Alice
Japanese	Kyoko
Korean	Yuna
Malay	Amira
Norwegian	Nora
Dutch	Ellen (Belgium), Xander (Netherlands)
Polish	Zosia
Portuguese	Luciana (Brazil), Joana (Portugal)
Romanian	Ioana
Russian	Milena
Slovak	Laura
Swedish	Alva
Thai	Kanya
Turkish	Yelda
Ukrainian	Lesya
Vietnamese	Linh
Chinese	Tingting (China), Sinji (Hong Kong), Meijia (Taiwan)

Features

Text to Speech on different languages with different type of voice models.
Listens to your voice and provides text, based on your setup.
You can get all available list of voices programatically
With Ergonomics while using
Dedicated delegates to control recording/speaking/reading states on your side.
RxSwift, Combine, TCA Support

Installation

Talkify is available through the Swift Package Manager.

Swift Package Manager

To integrate Talkify into your project using SPM, you can add the package dependency to your Package.swift file:

dependencies: [
    .package(url: "https://github.com/tornikegomareli/Talkify.git", .upToNextMajor(from: "0.1.0"))
]

Prerequisites

Before you start using Talkify, there are a few setup steps you need to ensure:

1. Permissions in Info.plist

To use the recording features of Talkify, you need to request microphone access. Additionally, for speech recognition, you must request speech recognition authorization. Add the following keys to your Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>We need access to the microphone to record your voice.</string>

<key>NSSpeechRecognitionUsageDescription</key>
<string>We need access to speech recognition to convert your voice into text.</string>

2. Enabling Audio Input (macOS Only)

For macOS users:

Open your Xcode project. Navigate to the "Signing & Capabilities" tab. In the "Resource Access" section, ensure that "Audio Input" is selected. This allows recording of audio using the built-in microphone and grants access to audio inputs using any Core Audio API that supports audio input. This step is not required for iOS.

How to Use

The Talkify class provides a high-level API for managing speech synthesis, recognition tasks and reading text with different voices.

Here's a guide on how to use it:

1. Setup

To start with, you'll need to initialize a Talkify instance:

let talkify = Talkify()

Setup delegates

talkify.recordingDelegate = self
talkify.speakingDelegate = self

Your class should then conform to the TalkifyRecordingDelegate and TalkifySpeakingDelegate protocols and implement their respective methods.

2. Recording Voice

Before starting recording, ensure to set up the recorder:

talkify
  .setupRecording()
  .startRecording()

You can stop recording programatically with

talkify
  .stopRecording()

The recognized text will be available through the recordingDidFinishWithResults(text:) delegate method.

3. Speech Synthesis

To start speaking a text, you need to setup speaker

Initialize the TalkifySpeaker:

let speaker = TalkifySpeaker()

Customizing Voice:

speaker.withVoice(customVoice: .kyoko) // Sets the voice to Kyoko (Japanese Female voice)

Customizing Voice Rate: This adjusts the speed at which the text is spoken. The value range typically is between 0.0 (slowest) and 1.0 (fastest), with 0.5 being the default rate.

speaker.withVoiceRate(value: 0.7) // Sets a faster speaking rate

Customizing Pitch Multiplier: This adjusts the pitch of the synthesized voice. A value of 1.0 means a regular pitch. Values above or below this can be used to raise or lower the pitch, respectively.

speaker.withMultiplier(value: 1.2) // Raises the pitch slightly

Customizing Volume: This adjusts the volume of the synthesized voice, with 1.0 being the loudest and 0.0 being muted.

speaker.withVolume(value: 0.8) // Slightly quieter than the default volume

Set speaker to Talkify instance:

talkify.setSpaker(wih: speaker) // Pass above created speaker instance

Start Speaking:

talkify.speak(text: "Hello, this is Talkify!")

You can pause or continue the speech synthesis using:

talkify.pauseSpeaking()
talkify.continueSpeaking()

Remember to handle the delegate methods for TalkifySpeakingDelegate to get callbacks about the speech synthesis status.

4. Setting a Specific Voice for Synthesis

With Talkify, you can choose a particular voice for speech synthesis. Here's how to set a voice:

let voice = TalkifyVoice(voice: .samantha, quality: .default)
talkify.voice = voice

Replace .samantha with the desired voice identifier from the TalkifyVoiceIdentifier enum. The quality parameter lets you set the voice's quality; you can choose between .default and other available options.

5. Choosing a Language for Recognition and Synthesis

To set a specific language for speech recognition and synthesis, you can leverage the TalkifyLanguage enum:

let language: TalkifyLanguage = .englishUS
talkify.recognitionLanguage = language
talkify.synthesisLanguage = language

Replace `.englishUS` with your desired language option from the `TalkifyLanguage` enum.

For detailed usage and advanced functionalities, refer to the inline documentation provided within the Talkify class and its extensions.

Contribution 🤝

I will appreciate your contributions! Whether you're fixing bugs, improving the documentation, or enhancing the features, I'd love to have your help. Here's how you can contribute:

Fork the repository: Start by forking the Talkify repository.
Clone your fork: git clone https://github.com/YOUR_USERNAME/Talkify.git
Create a branch: git checkout -b your-branch-name
Make your changes: Improve the codebase, add features, fix bugs, or enhance the documentation.
Commit your changes: git commit -m "Your descriptive commit message"
Push to your fork: git push origin your-branch-name
Submit a pull request: Go to the Talkify repository and create a new pull request. Describe your changes in detail and ensure it's directed from your branch to the main Talkify branch.

Issues 🐞

Encountered a bug or an unexpected behavior? I appreciate your feedback. Just Open a new issue on the GitHub repository, providing as much as u can. This helps me address and fix issues faster.

Future Plans ⚒️

Because this repository is more for educational purpose, I will happily add new functionalities step by step

watchOS Support: Aim to extend Talkify's capabilities to watchOS, allowing for seamless integration with Apple Watch applications.
Rx and Combine Listeners: In addition to the delegate pattern, I'm planning on introducing listeners using popular reactive frameworks like RxSwift and Combine.
Unit Tests: To ensure the robustness and reliability of Talkify, unit tests are on the way. This will boost confidence in the library's functionality and make future changes safer.
Third party integrations: I have idea to add some third party APIS, for example ChatGPT Speech recognition api with ergonomics to use, but I don't know I need to still think about it, if it will be worth at all.

Why ?

Just to beat my procrastination 😄

But it really aims to be a comprehensive solution for developers looking to incorporate speech recognition and synthesis into their apps. It abstracts away the complexity of the underlying APIs.

License

Talkify is licensed under the MIT License. See LICENSE for more information.

If you've found the README helpful or you like the project idea, please give it a ⭐️ (star) on GitHub.

tornikegomareli/Talkify