Głosik (pronounced "gwoh-seek") is an example app to showcase the F5-TTS text-to-speech system using MLX Swift. The name comes from the Polish word "głos" (voice) with the diminutive suffix "-ik".
Here is the original repository of the implementation: https://github.com/lucasnewman/f5-tts-swift
F5TTS_demo.mp4
Watch the demo above to see Głosik in action!
- macOS 14.0 or later
- iOS 16.0 or later
- visionOS 1.0 or later
- Xcode 15.0 or later
- Swift 5.9 or later
- Clone the repository
- Open
Glosik.xcodeproj
in Xcode - Build and run the project
- Enter the text you want to convert to speech
- (Optional) Record or select a reference audio sample:
- Go to the "Reference" tab
- Record a new audio sample and provide reference text
- Save it as a reference sample
- Select it from the reference picker in the "Generate" tab
- Click "Generate Speech" to create the audio
- Use the playback controls to listen to the generated speech
- Save the generated audio as a WAV file
- High-quality speech synthesis using F5-TTS model
- Real-time generation progress tracking
- Generation timing statistics
- GPU memory usage monitoring
- Record new reference samples with accompanying text
- Manage saved reference samples
- Select reference samples for speech generation
- Play back reference samples
- Support for mono, 24kHz WAV format
- Native SwiftUI interface
- Split-view navigation
- Dark mode support
- Cross-platform support (macOS, iOS, visionOS)
- Accessibility features
The project is split into two main parts:
Glosik
: Main applicationGlosikUI
: Reusable SwiftUI components package
This project is licensed under the MIT License. See the LICENSE file for details.