Thin service to allow using of unmodified Google Speech client SDKs to access Microsoft Cognitive Services speech APIs.
- Richer client support
Microsoft Cognitive Services speech APIs only provide JavaScript SDK for their latest open API and C#/iOS/Android for the legacy, closed API. Otherwise, you must rely on community built or build your own. For example, if you are building a C/C++ based application, you can use libmsspeech open source client library.
On the other hande, Google provides an extensive set of client libraries for multiple platform. At the time of this writing, they support C#, Go, Java, Node.js, PHP, Python and Ruby.
With msspeech-gbridge
service, you can directly use unmodified Google client libraries to access Microsoft APIs.
- Get union of features
Use the APIs interchangeably to get the features you need. For example, language support, text normalization, etc.
- Speech provider agnostic
You can build your applications using Google client SDKs but still be able to choose between Google Speech APIs and Microsoft Cognitive Services Speech APIs depending on your scenario without any code changes.
This is the very first version of the service. It only works.
- Microsoft Speech APIs do not support word level timing.
msspeech-gbridge
will not error when it is requested, but it will not return them. - Microsoft subscription key has to be supplied at the service, not by the client. WiP to supply it by clients.
- Microsoft Speech APIs support
conversational
mode, which is not available in Google APIs. - Differences in underlying speech recognition parametesr should be evaluated. For example, timeouts and segmentation.
- Support text to speech APIs.
- Implement the two remaining Google Speech APIs:
LongRunningRecognize
. - Support Google speech audio codecs that are not supported by Microsoft Speech APIs:
FLAC
,MULAW
,AMR_WB
,OGG_OPUS
,SPEEX_WITH_HEADER_BYTE
. - Better command line arguments.
- Support TLS.
- Lots of documentation!
Example usage using various Google SDKs can be found here:
Rest of platform SDKs should work but have not been tested.
docker run --rm -t -p 8080:8080 technicianted/msspeech-gbridge:experimental /run.sh <your microsoft speech subscription key>
If you have an Azure subscription, you can quickly bring up a container running msspeech-gbridge
as a service:
az group create --name gbridge --location eastus
az container create --resource-group gbridge --name msspeech-gbridge --image technicianted/msspeech-gbridge:experimental --ip-address public --ports 8080 --command-line '/run.sh <your subscription key>'
Then you can use the assigned public IP address as an endpoint when using the clients.
To use, you will need to build from source and run:
./msspeech-gbridge <your microsoft speech API subscription key>
There are some implications when using msspeech-gbridge
for your speech applications.
Depending on where/how you run msspeech-gbridge
, some latency might be incurred espcially for shorter audio. With each request, msspeech-gbridge
attempts to reuse existing connection. If there isn't any, it connects to Microsoft Speech Service and caches the new connection. In the latter case, the time it takes to establish the connection is added to the leading latency.
However, in most cases, Microsoft Speech Service would catch up and make up for the leading latency. In such cases, no trailing latency is added.
- Running locally with client (same local network or same box):
No latency as client to msspeech-gbridge
connection establishment is very fast.
- Running in same region as Microsoft Speech Service:
When running in Azure as a container, in most cases msspeech-gbridge
would hit a Microsoft Speech Service instanced that is located within the same datacenter. In which case, minimal or no latency is added.
- Running remotely:
Latency might be incurred due to connection establishment time if audio is very short. Otherwise, Microsoft Speech Service would catch up.
The application runs as a service. Which means you can either run it as a container or on a dedicated VM, both requiring continuous cost per seconds.
Currently you have to build from source.
- Update Google Cloud APIs:
git submodule update
- Generete Google Cloud APIs:
cd googleapis
make LANGUAGE=c++ OUTPUT=output/
- Build
msspeech-gbridge
:
mkdir build
cd build
cmake ../
make