/SwiftWhisper

🎤 The easiest way to transcribe audio in Swift

Primary LanguageSwiftMIT LicenseMIT

SwiftWhisper

The easiest way to use Whisper in Swift

Easily add transcription to your app or package. Powered by whisper.cpp.

Install

Swift Package Manager

Add SwiftWhisper as a dependency in your Package.swift file:

let package = Package(
  ...
  dependencies: [
    // Add the package to your dependencies
    .package(url: "https://github.com/exPHAT/SwiftWhisper.git", branch: "master"),
  ],
  ...
  targets: [
    // Add SwiftWhisper as a dependency on any target you want to use it in
    .target(name: "MyTarget",
            dependencies: [.byName(name: "SwiftWhisper")])
  ]
  ...
)

Xcode

Add https://github.com/exPHAT/SwiftWhisper.git in the "Swift Package Manager" tab.

Usage

API Documentation.

import SwiftWhisper

let whisper = Whisper(fromFileURL: /* Model file URL */)
let segments = try await whisper.transcribe(audioFrames: /* 16kHz PCM audio frames */)

print("Transcribed audio:", segments.map(\.text).joined())

Delegate methods

You can subscribe to segments, transcription progress, and errors by implementing WhisperDelegate and setting whisper.delegate = ...

protocol WhisperDelegate {
  // Progress updates as a percentage from 0-1
  func whisper(_ aWhisper: Whisper, didUpdateProgress progress: Double)

  // Any time a new segments of text have been transcribed
  func whisper(_ aWhisper: Whisper, didProcessNewSegments segments: [Segment], atIndex index: Int)
  
  // Finished transcribing, includes all transcribed segments of text
  func whisper(_ aWhisper: Whisper, didCompleteWithSegments segments: [Segment])

  // Error with transcription
  func whisper(_ aWhisper: Whisper, didErrorWith error: Error)
}

Misc

Downloading Models 📥

You can find the pre-trained models at here for download.

Converting audio to 16kHz PCM 🔧

The easiest way to get audio frames into SwiftWhisper is to use AudioKit. The following example takes an input audio file, converts and resamples it, and returns an array of 16kHz PCM floats.

import AudioKit

func convertAudioFileToPCMArray(fileURL: URL, completionHandler: @escaping (Result<[Float], Error>) -> Void) {
    var options = FormatConverter.Options()
    options.format = .wav
    options.sampleRate = 16000
    options.bitDepth = 16
    options.channels = 1
    options.isInterleaved = false

    let tempURL = URL(fileURLWithPath: NSTemporaryDirectory()).appendingPathComponent(UUID().uuidString)
    let converter = FormatConverter(inputURL: fileURL, outputURL: tempURL, options: options)
    converter.start { error in
        if let error {
            completionHandler(.failure(error))
            return
        }

        let data = try! Data(contentsOf: tempURL) // Handle error here

        let floats = stride(from: 44, to: data.count, by: 2).map {
            return data[$0..<$0 + 2].withUnsafeBytes {
                let short = Int16(littleEndian: $0.load(as: Int16.self))
                return max(-1.0, min(Float(short) / 32767.0, 1.0))
            }
        }

        try? FileManager.default.removeItem(at: tempURL)

        completionHandler(.success(floats))
    }
}

Speed boost 🚀

You may find the performance of the transcription slow when compiling your app for the Debug build configuration. This is because the compiler doesn't fully optimize SwiftWhisper unless the build configuration is set to Release.

You can get around this by installing a version of SwiftWhisper that uses .unsafeFlags(["-O3"]) to force maximum optimization. The easiest way to do this is to use the latest commit on the fast branch. Alternatively, you can configure your scheme to build in the Release configuration.

  ...
  dependencies: [
    // Using latest commit hash for `fast` branch:
    .package(url: "https://github.com/exPHAT/SwiftWhisper.git", revision: "d7c0925045e671624db31488c6ffdc7207dd23fa"),
  ],
  ...