This package is part of the ALIZÉ project: http://alize.univ-avignon.fr
ALIZÉ is a software platform for automatic speaker recognition. It can be used for carrying out research in this field, as well as for incorporating speaker recognition into applications.
This repository hosts material to allow people to compile and use ALIZÉ (alize-core
and LIA_RAL
) on the Android platform:
- a project for Android Studio to help compile ALIZÉ into an Android library (in AAR format) for inclusion in applications
- a JNI layer to facilitate access to ALIZÉ (which is developed in C++) from Java sources.
The repository does not contain the sources of ALIZÉ itself (see below for how to download and install them).
The JNI layer does not, at this time, offer complete access to all of the classes in ALIZÉ. For now, it focuses only on offering access to high-level APIs which allow to run a speaker detection system (for speaker verification/identification) on the Android platform. It uses the class SimpleSpkDetSystem and provides the same API as the C++ version of this class.
Through this API, a user can feed the system audio data and use it to train speaker models, and run speaker verification and identification tasks. It is the responsibility of the host application to perform audio capture from the microphone or any other device — the ALIZÉ library does no provide this function.
You need to get the source for alize-core
and LIA_RAL
from their respective repositories, and put the two source folders at this path:
{project_root}/alize/src/main/cpp/
In order to extract features from the audio signal, ALIZÉ relies on the free speech signal processing toolkit SPro, developed by Guillaume Gravier at IRISA: https://gforge.inria.fr/projects/spro/
You need to download the source code package for SPro (please read the warning below), uncompress/unarchive it, and put the resulting folder at the same path as the other two ({project_root}/alize/src/main/cpp/
). The name for the folder is expected to be spro
, with no version number. There is no need to compile SPro using the configure
script and makefile provided in the package; only access to the source code is required.
Once all three source folders (alize-core
, LIA_RAL
, spro
) are in place, open the project with Android Studio.
In the Build
menu, select Build APK
. It will generate an Android archive for ALIZÉ, which you can then import as a module in your application projects.
This library does not handle audio recording. Audio is passed to the speaker recognition system using the various addAudio
methods.
The frequency of the audio signal must be specified in the configuration file, using the parameter SPRO_sampleRate
.
The default format for the audio samples is 16-bit, signed integer linear PCM.
The parameter SPRO_format
may be used in the configuration file in order to specify a different format (refer to spro.h
for the list of formats supported by SPro). However, it is unlikely to be useful in the context of ALIZÉ for Android.
If a different sample/file format is specified this way, it will be used to process audio data sent to the system using the methods addAudio(String filename)
, addAudio(InputStream audioDataStream)
and addAudio(byte[] audioData)
.
But the method addAudio(short[] linearPCMSamples)
, as its signature and parameter name imply, always expects 16-bit, signed integer linear PCM, ignoring the setting for SPRO_format
.
In Android Studio, import the AAR archive into your application project as a new module (File
▸ New
▸ New Module…
, then Import .JAR/.AAR Package
). Remember to update the app module's build.gradle
file to include the library in the dependencies.
The procedure is detailed on this webpage: https://developer.android.com/studio/projects/android-library.html
The Java classes providing access to ALIZÉ are then available in the AlizeSpkRec
package:
import AlizeSpkRec.*;
In order to create a new speaker recognition system, two things are needed:
- a configuration file
- a path to a directory where the system can store files (speakers models + temporary files)
The configuration can be provided by passing the constructor either a file name, or an input stream. The latter is particularly useful in the common case where the configuration file is packaged as an application asset, as illustrated below.
InputStream configAsset = getApplicationContext().getAssets().open("MyConfig.cfg");
SimpleSpkDetSystem alizeSystem = new SimpleSpkDetSystem(configAsset, getApplicationContext().getFilesDir().getPath());
configAsset.close();
We then load the background model, also from the application assets.
InputStream backgroundModelAsset = getApplicationContext().getAssets().open("gmm/world.gmm");
alizeSystem.loadBackgroundModel(backgroundModelAsset);
backgroundModelAsset.close();
System.out.println("System status:");
System.out.println(" # of features: " + alizeSystem.featureCount()); // at this point, 0
System.out.println(" # of models: " + alizeSystem.speakerCount()); // at this point, 0
System.out.println(" UBM is loaded: " + alizeSystem.isUBMLoaded()); // true
// Record audio
// The system takes 16-bit, signed integer linear PCM, at the frequency specified in the configuration file.
short[] audio = …
// Send audio to the system
alizeSystem.addAudio(audio);
// Train a model with the audio
alizeSystem.createSpeakerModel("Somebody");
After this, alizeSystem.speakerCount()
returns 1.
alizeSystem.featureCount()
> 0 and corresponds to the number of feature vectors extracted from the audio signal.
alizeSystem.resetAudio();
alizeSystem.resetFeatures();
// Record some more audio
short[] moreAudio = …
// Send the new audio to the system
alizeSystem.addAudio(moreAudio);
// Perform speaker verification against the model we created earlier
SpkRecResult verificationResult = alizeSystem.verifySpeaker("Somebody");
verificationResult.match
is a boolean indicating the resulting decision: if true
, the signal matches the speaker model.
verificationResult.score
gives the score on which the decision is based.
InputStream modelAsset = getApplicationContext().getAssets().open("gmm/somebody_else.gmm");
alizeSystem.loadSpeakerModel("Somebody else", modelAsset);
modelAsset.close();
At this point, alizeSystem.speakerCount() == 2
.
With two speaker models, we can try speaker identification.
We will use the same audio signal as previously.
Since we have not unloaded it yet (through alizeSystem.resetAudio()
and alizeSystem.resetFeatures()
), there is no need to resend it.
SpkRecResult identificationResult = alizeSystem.identifySpeaker();
identificationResult.match
is a boolean indicating the resulting decision: if true
, the signal matches one of the speaker models in the database.
identificationResult.speakerId
, of type String, gives the ID of the best matching speaker.
identificationResult.score
gives the score obtained for the best matching speaker model.
All the methods of class SimpleSpkDetSystem throw exceptions of type AlizeException in case of a problem encountered by the underlying C++ library.