The Watson Speech SDK for the Android platform enables an easy and lightweight interaction with the IBM's Watson Speech-To-Text (STT) and Text-To-Speech (TTS) services in Bluemix. The SDK includes support for recording and streaming audio in real time to the STT service while receiving a transcript of the audio as you speak. This project includes an example application that showcases the interaction with both the STT and TTS Watson services in the cloud.
The current version of the SDK uses a minSdkVersion of 9, while the example application uses a minSdkVersion of 16.
Using the library
- Download the speech-android-wrapper.aar
- Once unzipped drag the speech-android-wrapper.aar file into your Android Studio project view under the libs folder.
- Go to build.gradle file of your app, then set the dependencies as below:
dependencies {
compile fileTree(dir: 'libs', include: ['*.jar'])
compile (name:'speech-android-wrapper',ext:'aar')
compile 'com.android.support:appcompat-v7:22.0.0'
}
repositories{
flatDir{
dirs 'libs'
}
}
- Clean and run the Android Studio project
- Create an account on Bluemix if you have not already.
- Follow instructions at Service credentials for Watson services to get service credentials.
To get started, you can also take a look at a quick start guide created by @KeyOnTech.
These delegates implement the callbacks when a response from the server is received or when the recorder is sending back the audio data. SpeechRecorderDelegate is optional.
public class MainActivity extends Activity implements ISpeechDelegate{}
Or with SpeechRecorderDelegate
public class MainActivity extends Activity implements ISpeechDelegate, SpeechRecorderDelegate{}
SpeechToText.sharedInstance().initWithContext(new URI("wss://stream.watsonplatform.net/speech-to-text/api"), this.getApplicationContext(), new SpeechConfiguration());
Enabling audio compression
By default audio sent to the server is uncompressed PCM encoded data, compressed audio using the Opus codec can be enabled.
SpeechToText.sharedInstance().initWithContext(this.getHost(STT_URL), this.getApplicationContext(), new SpeechConfiguration(SpeechConfiguration.AUDIO_FORMAT_OGGOPUS));
Or this way:
// Configuration
SpeechConfiguration sConfig = new SpeechConfiguration(SpeechConfiguration.AUDIO_FORMAT_OGGOPUS);
// STT
SpeechToText.sharedInstance().initWithContext(this.getHost(STT_URL), this.getApplicationContext(), sConfig);
Set the Credentials and the delegate
SpeechToText.sharedInstance().setCredentials(this.USERNAME,this.PASSWORD);
SpeechToText.sharedInstance().setDelegate(this);
Alternatively pass a token factory object to be used by the SDK to retrieve authentication tokens to authenticate against the STT service
SpeechToText.sharedInstance().setTokenProvider(new MyTokenProvider(this.strSTTTokenFactoryURL));
SpeechToText.sharedInstance().setDelegate(this);
JSONObject models = getModels();
JSONObject model = getModelInfo("en-US_BroadbandModel");
SpeechToText.sharedInstance().setModel("en-US_BroadbandModel");
SpeechToText.sharedInstance().recognize();
If you implemented SpeechRecorderDelegate, and needs to process the audio data which is recorded, you can use set the delegate.
SpeechToText.sharedInstance().recognize();
SpeechToText.sharedInstance().setRecorderDelegate(this);
Delegate methods to receive messages from the sdk
public void onOpen() {
// the connection to the STT service is successfully opened
}
public void onError(String error) {
// error interacting with the STT service
}
public void onClose(int code, String reason, boolean remote) {
// the connection with the STT service was just closed
}
public void onMessage(String message) {
// a message comes from the STT service with recognition results
}
SpeechRecognition.sharedInstance().stopRecording();
The amplitude is calculated from the audio data buffer, and the volume (in dB) is calculated based on it.
@Override
public void onAmplitude(double amplitude, double volume) {
// your code here
}
TextToSpeech.sharedInstance().initWithContext(this.getHost(TTS_URL));
Set the Credentials
TextToSpeech.sharedInstance().setCredentials(this.USERNAME,this.PASSWORD);
Alternatively pass a token factory object to be used by the SDK to retrieve authentication tokens to authenticate against the TTS service
TextToSpeech.sharedInstance().setTokenProvider(new MyTokenProvider(this.strTTSTokenFactoryURL));
TextToSpeech.sharedInstance().voices();
TextToSpeech.sharedInstance().setVoice("en-US_MichaelVoice");
TextToSpeech.sharedInstance().synthesize(ttsText);