[PYTHON] Streaming speech recognition with Google Cloud Speech API

Information as of September 14, 2016 *

Try streaming speech recognition from microphone input with Google Cloud Speech API.

Previously I tried to recognize recorded files with REST API version, so this time I will try streaming recognition with gRPC version.

procedure

Google official sample Follow the README procedure in.

This time I will try streaming recognition transcript_streaming.py.

Same procedure as REST version until getting json of Service Account.

Sign up for Google Cloud platform
Create a project in the Developer console, enable the Speech API, and get the Service Account json file for authentication.
Set the downloaded json file to the environment variable GOOGLE_APPLICATION_CREDENTIALS
Run sample script
Enable port audio
Install the required pip module (virtualenv recommended)
Set transcribe_streaming.py to recognize Japanese
Change the language_code of recognition_config from en-US to ja-JP
Adjust the sampling rate etc. to suit your environment
The setting around the device is record_audio, which is the method of pyaudio.
Run the sample in $ python transcribe_streaming.py and speak into the microphone

When started, recognition continues as long as service.StreamingRecognize returns a value in listen_print_loop. (It ends with a timeout when the number of seconds of DEADLINE_SECS elapses).

This sample finishes processing when the statement contains the words ʻexit or quit(the latter half of * listen_print_loop *), so these words can be stopped asstop or end`. If you change it, you can do the same in Japanese.

Cognitive behavior

――Until there is silence for a certain period of time, it is recognized as a continuous utterance even if there is some time. --Once recognized, ʻis_final = Trueandconfidence are returned with the resulting text. -If you specify ʻinterim_results = True in * streaming_config *, you can get the recognition result during the utterance.

The recognition in the middle of the utterance seems to be done at the word level, and I am surprised at a speed that I can not think through the network. However, the recognition result in the middle may be wrong, so if you do not hurry, it will end all It's better to wait.

See the gRPC API Manual (https://cloud.google.com/speech/reference/rpc/google.cloud.speech.v1beta1#google.cloud.speech.v1beta1.Speech.StreamingRecognize) for other options.

The Github code is updated quite often, so you should check it daily.

Bug

I tried it with the built-in microphone of the laptop / external microphone of USB with MAC and Linux respectively, but after about 3-10 utterances or 15-30 seconds, they do not recognize without any error. Investigation required.

Miscellaneous feelings

Since it is v1beta1, it seems that it is still in the testing stage. It seems difficult to use it correctly unless you are accustomed to gRPC (and how to handle it from pyton).