SpeechRecognition

#Requirements:

Record the meeting discussion voice
Convert it to text
Recognise the speaking person’s voice (identify the speaker)
Assuming speakers talk in turn; if not, the assistant shall prompt for speakers to take turns
When the recognition confidence is low, prompt the speaker to repeat or confirm what was recorded Data (Record Audio) Process Flow: Convert to text · 6 minutes Clustering o Time sampling o Data Sampling Feature Extraction o Unsupervised Trained Model · start Listening · Identify Cluster number (e.g. cluster 3) · IDENTIFY OVERLAP o Identified Clusters more than 1 – in a given time o Identification score low  Ask to repeat Algorithm I
Record Audio
Extract Features
Train an unsupervised model
Record Audio a. Extract features b. Predict speaker c. Convert speech to text i. For a poor detection, repeat speech
Repeat 4. ii. For a good detection, write to file [Person Name, Text]
Repeat 4.

Observations: GoogleAPI has a huge resource of recognizable speech. It can be further explored for better applications and problem solutions. The duration of speech when the Speech-to-Text function is called is not robust. It is fixed to give the GoogleAPI time to match the current phrase. However, in real-time, the duration of phrases vary, and can’t be limited to a short sentence, such as in a conference. Sampling rate of the signals for people who speak different dialects of a language could also vary. A network design for speech identification must be robust and not limited to a fixed number of people in a meeting, for instance. There is a lot of room for classification techniques for robust applications, such as , for a meeting which grows in participants without prior notice. Overlapping speech will result in poor detection score and can be corrected. This can begets the machine to be well trained for identification. Running the Code in Matlab: Follow the sequence for running different stages of the flow:

file8_speech2textWithGoogle
file8_1_speech2feats.m
file8_som.m
file9_test2retrainAndIdentify.m
file10_write2file.m

h612/Speech-recognition

SpeechRecognition