GCP Speaker Diarization

Google speech-to-text API provides speaker diarization capabilities alongside their transcriptions. I've put together this easy to follow notebook that allows you to send audio files through GCP's 'long_running_recognize' to perform operation, to perform asynchronous speech recognition (amongst other things), and return structured .csv files with the output of the API.

In order to get started there are a couple of things you need to do.

Clone this repo
Install Google Cloud's suite of tools in python pip install --upgrade google-cloud-speech
Create a new project on GCP
Create a billing account. You need to put payment info so you can use the services
Once you have a billing account, you can Enable the API you want to use
Create credentials for that API service.
- You'll get a JSON file with your creds in it
Export that creds file to your bashrc with: export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

Usage

There are two ways you can use this: follow it on jupyter notebook or simply run the following on your command line:

python google-diarization.py -i <input .wav file>

This will output a .csv file to the data/outputs/ directory.

Spectrum

I created this pipeline with the goal of using it as a tool to supplement Spectrum with the capability of identifying multiple speaker dialog in a lecture. For those inside the Penn State network that want to learn more about the tool feel free to visit https://dstoolbox.tlt.psu.edu/

mingqxu7/gcp-speaker-diarization

GCP Speaker Diarization

Usage

Spectrum