/android-alize

ALIZE for the Android platform.

Primary LanguageC++GNU Lesser General Public License v3.0LGPL-3.0

The ALIZÉ logo

ALIZÉ for Android

This package is part of the ALIZÉ project: http://alize.univ-avignon.fr

Welcome to ALIZÉ

ALIZÉ is a software platform for automatic speaker recognition. It can be used for carrying out research in this field, as well as for incorporating speaker recognition into applications.

http://alize.univ-avignon.fr

ALIZÉ for Android

This repository hosts material to allow people to compile and use ALIZÉ (alize-core and LIA_RAL) on the Android platform:

  • a project for Android Studio to help compile ALIZÉ into an Android library (in AAR format) for inclusion in applications
  • a JNI layer to facilitate access to ALIZÉ (which is developed in C++) from Java sources.

The repository does not contain the sources of ALIZÉ itself (see below for how to download and install them).

The JNI layer does not, at this time, offer complete access to all of the classes in ALIZÉ. For now, it focuses only on offering access to high-level APIs which allow to run a speaker detection system (for speaker verification/identification) on the Android platform. It uses the class SimpleSpkDetSystem and provides the same API as the C++ version of this class.

Through this API, a user can feed the system audio data and use it to train speaker models, and run speaker verification and identification tasks. It is the responsibility of the host application to perform audio capture from the microphone or any other device — the ALIZÉ library does no provide this function.

What is needed to build ALIZÉ for Android?

You need to get the source for alize-core and LIA_RAL from their respective repositories, and put the two source folders at this path: {project_root}/alize/src/main/cpp/

In order to extract features from the audio signal, ALIZÉ relies on the free speech signal processing toolkit SPro, developed by Guillaume Gravier at IRISA: https://gforge.inria.fr/projects/spro/

You need to download the source code package for SPro (please read the warning below), uncompress/unarchive it, and put the resulting folder at the same path as the other two ({project_root}/alize/src/main/cpp/). The name for the folder is expected to be spro, with no version number. There is no need to compile SPro using the configure script and makefile provided in the package; only access to the source code is required.

⚠️Warning: Note that only the revisions 155 and up of SPro are fully compatible with 64-bit CPUs and ALIZÉ. However, at the time of this writing, these versions of SPro are only available through Subversion, and the direct download link given on the website above points to an older revision of SPro 5 which includes a bug leading to corrupted feature files when compiled for 64 bit systems. If you want to be sure to get the right version of SPro for use with ALIZÉ, you can download it from ALIZÉ's website: http://alize.univ-avignon.fr/spro-5.0-157.tar.gz.

How to compile

Once all three source folders (alize-core, LIA_RAL, spro) are in place, open the project with Android Studio. In the Build menu, select Build APK. It will generate an Android archive for ALIZÉ, which you can then import as a module in your application projects.

How to use it

Audio input

This library does not handle audio recording. Audio is passed to the speaker recognition system using the various addAudio methods.

The frequency of the audio signal must be specified in the configuration file, using the parameter SPRO_sampleRate. The default format for the audio samples is 16-bit, signed integer linear PCM.

The parameter SPRO_format may be used in the configuration file in order to specify a different format (refer to spro.h for the list of formats supported by SPro). However, it is unlikely to be useful in the context of ALIZÉ for Android. If a different sample/file format is specified this way, it will be used to process audio data sent to the system using the methods addAudio(String filename), addAudio(InputStream audioDataStream) and addAudio(byte[] audioData). But the method addAudio(short[] linearPCMSamples), as its signature and parameter name imply, always expects 16-bit, signed integer linear PCM, ignoring the setting for SPRO_format.

Import the library

In Android Studio, import the AAR archive into your application project as a new module (FileNewNew Module…, then Import .JAR/.AAR Package). Remember to update the app module's build.gradle file to include the library in the dependencies.

The procedure is detailed on this webpage: https://developer.android.com/studio/projects/android-library.html

The Java classes providing access to ALIZÉ are then available in the AlizeSpkRec package:

import AlizeSpkRec.*;

Create an instance of a speaker recognition system

In order to create a new speaker recognition system, two things are needed:

  • a configuration file
  • a path to a directory where the system can store files (speakers models + temporary files)

The configuration can be provided by passing the constructor either a file name, or an input stream. The latter is particularly useful in the common case where the configuration file is packaged as an application asset, as illustrated below.

InputStream configAsset = getApplicationContext().getAssets().open("MyConfig.cfg");
SimpleSpkDetSystem alizeSystem = new SimpleSpkDetSystem(configAsset, getApplicationContext().getFilesDir().getPath());
configAsset.close();

We then load the background model, also from the application assets.

InputStream backgroundModelAsset = getApplicationContext().getAssets().open("gmm/world.gmm");
alizeSystem.loadBackgroundModel(backgroundModelAsset);
backgroundModelAsset.close();

Check system status

System.out.println("System status:");
System.out.println("  # of features: " + alizeSystem.featureCount());   // at this point, 0
System.out.println("  # of models: " + alizeSystem.speakerCount());     // at this point, 0
System.out.println("  UBM is loaded: " + alizeSystem.isUBMLoaded());    // true

Train a speaker model

// Record audio
// The system takes 16-bit, signed integer linear PCM, at the frequency specified in the configuration file.
short[] audio = …

// Send audio to the system
alizeSystem.addAudio(audio);

// Train a model with the audio
alizeSystem.createSpeakerModel("Somebody");

After this, alizeSystem.speakerCount() returns 1. alizeSystem.featureCount() > 0 and corresponds to the number of feature vectors extracted from the audio signal.

Reset input before sending another signal

alizeSystem.resetAudio();
alizeSystem.resetFeatures();

Perform speaker verification

// Record some more audio
short[] moreAudio = …

// Send the new audio to the system
alizeSystem.addAudio(moreAudio);

// Perform speaker verification against the model we created earlier
SpkRecResult verificationResult = alizeSystem.verifySpeaker("Somebody");

verificationResult.match is a boolean indicating the resulting decision: if true, the signal matches the speaker model. verificationResult.score gives the score on which the decision is based.

Load a pre-trained speaker model packaged with the application

InputStream modelAsset = getApplicationContext().getAssets().open("gmm/somebody_else.gmm");
alizeSystem.loadSpeakerModel("Somebody else", modelAsset);
modelAsset.close();

At this point, alizeSystem.speakerCount() == 2.

Perform speaker identification

With two speaker models, we can try speaker identification. We will use the same audio signal as previously. Since we have not unloaded it yet (through alizeSystem.resetAudio() and alizeSystem.resetFeatures()), there is no need to resend it.

SpkRecResult identificationResult = alizeSystem.identifySpeaker();

identificationResult.match is a boolean indicating the resulting decision: if true, the signal matches one of the speaker models in the database. identificationResult.speakerId, of type String, gives the ID of the best matching speaker. identificationResult.score gives the score obtained for the best matching speaker model.

Exceptions

All the methods of class SimpleSpkDetSystem throw exceptions of type AlizeException in case of a problem encountered by the underlying C++ library.