Person identification is the problem of determining the identity of a person from a closed set of candidates. Conventional biometric systems use person-specific models tested on the same modalities on which they are trained. The modality can be audio, visual or a fusion of audio and visual (bimodal). Recently there has been significant interest in multi-modal human computer interfaces, especially audio-visual (AV) systems for applications in areas such as banking, and security systems.
For each speech frame of around i.e. 30 ms with overlap, a set of Mel-Frequency Cepstrum Coefficients (MFCC) is computed. These are result of a cosine transform of the logarithm of the short-term power spectrum expressed on a mel-frequency scale. This set of coefficients is called an acoustic vector. Therefore each input utterance is transformed into a sequence of acoustic vectors. Extracting the MFCC coefficients shows how those acoustic vectors can be used to represent and recognize the voice characteristic of the speaker.
A set of visual features were computed, in order to have a particular signature that allows the identification of a person. It is composed by different soft biometric cues, extracted from the depth data acquired using a RGB-D sensor, i.e. Kinect 2. In total 12 visual features were extracted.
This set of features could be defined as cues based on the combination of distances among joints, distances between the floor plane and all the possible joints.
The surface-based features refers to geodesic distances on the mesh surface computed from different joint pairs.
- Tatiana Lopez-Guevara
- Dina Youakim
- Ahmed Karam Eldaly
-
Barbosa, IgorBarros; Cristani, Marco; Del Bue, Alessio; Bazzani, Loris; Murino, Vittorio, “Re-identification with RGB-D Sensors”, Computer Vision – ECCV 2012. Workshops and Demonstrations, Lecture Notes in Computer Science, 2012.
-
Mogelmose, A; Bahnsen, C.; Moeslund, T.B.; Clapes, A; Escalera, S., "Tri-modal Person Re-identification with RGB, Depth and Thermal Features," Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on , vol., no., pp.301,307, 23-28 June 2013.
-
Logan, B., 2000, October. Mel frequency cepstral coefficients for music modeling. In ISMIR (Vol. 270, pp. 1-11).