TLSphinx is a Swift wrapper around Pocketsphinx, a portable library based on CMU Sphinx, that allow an application to perform speech recognition without the audio ever leaving the device
This repository has two main parts. The first is a syntetized version of the pocketsphinx and sphinx base repositories with a module map to access the library as a Clang module. This module is accessed under the name Shpinx
and has two submodules: Pocket
and Base
in reference to pocketsphinx and sphinx base.
The second part is TLSphinx
, a Swift framework that uses the Sphinx
Clang module and exposes a Swift-like API that talks to pocketsphinx.
Note: I write a blog post about TLSphinx
here at the Tryolabs Blog. Check it out for a short history about why I wrote this.
The framework provides three classes:
Config
describe the configuration needed to recognize speech.Decoder
is the main class that provides the API to perform all decoding.Hypotesis
is the result of a decode attempt. It has atext
and ascore
properties.
Represents the cmd_ln_t opaque structure in Sphinx
. The default constructor takes an array of tuples with the form (param name, param value)
where "param name" is the name of one of the parameters recognized by Sphinx
. In this example we are passing the acustic model, the language model and the dictionary. For a complete list of recognized parameters check the Sphinx docs.
The class has a public property to turn on/off the debug info from Sphinx
:
public var showDebugInfo: Bool
Represent the ps_decoder_t opaque struct in Sphinx
. The default constructor take a Config
object as parameter.
This has the functions to perform the decode from a file or from the mic. The result is returned in an optional Hypotesis
object, following the naming convention of the Pocketsphinx API. The functions are:
To decode speech from a file:
public func decodeSpeechAtPath (filePath: String, complete: (Hypotesis?) -> ())
The audio pointed by filePath
must have the following characteristics:
- single-channel (monaural)
- little-endian
- unheadered
- 16-bit signed
- PCM
- sampled at 16000 Hz
To control the size of the buffer used to read the file, the Decoder
class has a public property
public var bufferSize: Int
To decode a live audio stream from the mic:
public func startDecodingSpeech (utteranceComplete: (Hypotesis?) -> ())
public func stopDecodingSpeech ()
You can use the same Decoder
instance many times.
This struct represents the result of a decode attempt. It has a text
property with the best scored text and a score
with the score value. This struct implements Printable
so you can print it with println(hypotesis_value)
.
As an example let's see how to decode the speech in an audio file. To do so you first need to create a Config
object and pass it to the Decoder
constructor. With the decoder you can perform automatic speech recognition from an audio file like so:
import TLSphinx
let hmm = ... // Path to the acustic model
let lm = ... // Path to the languaje model
let dict = ... // Path to the languaje dictionary
if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
if let decoder = Decoder(config:config) {
let audioFile = ... // Path to an audio file
decoder.decodeSpeechAtPath(audioFile) {
if let hyp: Hypotesis = $0 {
// Print the decoder text and score
println("Text: \(hyp.text) - Score: \(hyp.score)")
} else {
// Can't decode any speech because of an error
}
}
} else {
// Handle Decoder() fail
}
} else {
// Handle Config() fail
}
The decode is performed with the decodeSpeechAtPath
function in the bacground. Once the process finishes, the complete
closure is called in the main thread.
import TLSphinx
let hmm = ... // Path to the acoustic model
let lm = ... // Path to the language model
let dict = ... // Path to the language dictionary
if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
if let decoder = Decoder(config:config) {
decoder.startDecodingSpeech {
if let hyp: Hypotesis = $0 {
println(hyp)
} else {
// Can't decode any speech because an error
}
}
} else {
// Handle Decoder() fail
}
} else {
// Handle Config() fail
}
//At some point in the future stop listen to the mic
decoder.stopDecodingSpeech()
The easiest way to integrate TLSphinx
is using Carthage or a similar method to get the framework bundle. This lets you integrate the framework and the Sphinx
module without magic.
In your Cartfile
add a reference to the last version of TLSphinx
:
github "Tryolabs/TLSphinx" ~> 0.0.4
Then run carthage update
, this should fetch and build the last version of TLSphinx
. Once it's done, drag the TLSphinx.framewok bundle to the XCode Linked Frameworks and Libraries. You must tell XCode where to find Sphinx
module that is located in the Carthage checkout. To do so:
- add
$(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include
to Header Search Paths recursive - add
$(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/lib
to Library Search Paths recursive - in Swift Compiler - Search Paths add
$(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include
to Import Paths
Download the project from this repository and drag the TLSpinx project to your XCode project. If you encounter any errors about missing headers and/or libraries for Sphinx please add the Spinx/include
directory to your header search path and Sphinx/lib
to the library search path and mark it as recursive
.
BrunoBerisso, bruno@tryolabs.com
TLSphinx is available under the MIT license. See the LICENSE file for more info.