[PYTHON] Speech transcription procedure using Google Cloud Speech API

Purpose

Article on transcribing voice using Google Cloud Speech API as one of the means to store the recorded sound source as text data by participating in English lectures and conferences. items / 659bde4cdc8ce5c78e29) was helpful, so I will reorganize the procedure below (procedure memo).

Advance preparation

procedure

1. Enter the console screen

Go to the Google Cloud Platform URL (https://cloud.google.com/?hl=ja) and press Open Console to enter the console screen.

Google Cloud Platform console login screen: スクリーンショット 2017-09-16 11.55.30.png

Console screen: スクリーンショット 2017-09-16 11.56.12.png

2. Enable the Google Cloud Speech API

Select Tools & Services> APIs & Services> Library at the top left of the console screen, select Speech API from the list of APIs, and press Enable to enable the Google Speech API.

スクリーンショット 2017-09-16 12.00.13.png

API list screen スクリーンショット 2017-09-16 12.02.19.png

Enable API ([Disable] is displayed because it is already enabled) スクリーンショット 2017-09-16 12.02.58.png

You can check the activation of Google Speech API in [API and Services]> [Dashboard]: スクリーンショット 2017-09-16 12.19.01.png

3. Create API credentials (create service account key)

Select [API and Services]> [Credentials]> [Create Credentials]> [Service Account Key] on the left, set an appropriate [Service Account Name](assumed to be arkbbb here), and click the Create button. Press to download the JSON file.

スクリーンショット 2017-09-16 12.08.54.png

Service account key creation screen: スクリーンショット 2017-09-16 12.09.51.png

4. API authentication with Google Cloud Shell (service account key JSON upload & environment variable registration)

Start Google Cloud Shell with the Google Cloud Shell button at the top right of the Google Cloud Platform console screen, upload the JSON obtained in 3., and set it in the environment variable.

Google Cloud Shell Button: スクリーンショット 2017-09-16 12.21.39.png

JSON upload: スクリーンショット 2017-09-16 12.25.45.png

Environment variable setting command


$ export GOOGLE_APPLICATION_CREDENTIALS=[3.JSON name obtained in].json

5. Upload voice data

Upload the prepared voice data to Google Cloud Storage. First, select [Tools and Services]> [Storage]> [Browser] at the top left of the screen, create a bucket with [Create Bucket], double-click the created bucket, and click [Upload File] for audio data. To upload.

Go to Google Cloud Storage screen: スクリーンショット 2017-09-16 12.28.57.png

Creating a bucket (bucket name and other settings are in text): スクリーンショット 2017-09-16 12.30.19.png

Uploading files into your bucket: スクリーンショット 2017-09-16 12.36.32.png

6. Transcription execution Python script creation

Create a Python script for transcription execution on Google Cloud Shell.

Python file editing command (editor as you like)


$ nano transcribe.py

Python script for transcription (for English voice):

transcribe.py


# !/usr/bin/env python
# coding: utf-8
import argparse
import io
import sys
import codecs
import datetime
import locale

def transcribe_gcs(gcs_uri):
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        sample_rate_hertz=16000,
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        language_code='en-US')

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    operationResult = operation.result()

    d = datetime.datetime.today()
    today = d.strftime("%Y%m%d-%H%M%S")
    fout = codecs.open('output{}.txt'.format(today), 'a', 'shift_jis')

    for result in operationResult.results:
      for alternative in result.alternatives:
          fout.write(u'{}\n'.format(alternative.transcript))
    fout.close()

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument(
        'path', help='GCS path for audio file to be recognized')
    args = parser.parse_args()
    transcribe_gcs(args.path)

If you want to transcribe Japanese, modify the following line:

language_code='en-US')

language_code='ja-JP')

7. Performing voice transcription

Execute transcription with the following command on Google Cloud Console.

$ python transcribe.py gs://Bucket name/Voice data name.flac

8. Execution result

If you check the file created by the ls command on the Google Cloud Console after execution, a text file named [output * .txt] will be created, so you can open it and check the result. The result for the first 1-2 minutes was below. If you listen to it together with Sound source, there are some mistakes, but you can see that it is mostly transcribed.

and not.
 We have just attended this big Tatum Outlet
 and we held a pydata event it was actually the first I did it
 and some of these slides are actually problem, says talk to and so at strata we saw many people talking about the Duke talking about Big Data there were looking at using Java in a management
 and there was a whole lot of our versus Python language rewards on Facebook
 the Travis and I were not content with the state of things we saw that python to play a very significant role Travis made the slide that's from The Little Prince that shows a snake swallowing the open
 he was also talking about using compilers make python faster
 it was also not that pilot event that we were very fortunate to have weido been awesome stopping by and we talked to him about things like the matrix multiplication operator we talked about coding expressions and things like that
 and so this actually his picture show does Travis and West McKinney who's the greater pandas and Guido van Rossum
 add
 and we ask we don't fix the packaging problem he told us that we should do it ourselves
 and so we did and that's how it came up with Honda and Anaconda which I think quite elegantly solves the difficult packaging problems for the Scientific Games
 so we accepted the challenge and so for those who don't know what Anaconda is very quickly I'll give you it is basically a very simple way and very reliable way to get final versions of many very popular typical to build packages in libraries in the python ecosystem

By the way, the actual result data is here

important point

reference

Impressions

Recommended Posts

Speech transcription procedure using Google Cloud Speech API
Speech transcription procedure using Python and Google Cloud Speech API
Automatic voice transcription with Google Cloud Speech API
Google Cloud Speech API vs. Amazon Transcribe
Streaming speech recognition with Google Cloud Speech API
Stream speech recognition using Google Cloud Speech gRPC API on python3 on Mac!
I tried using the Google Cloud Vision API
[Google Cloud Platform] Use Google Cloud API using API Client Library
Speech file recognition by Google Speech API v2 using Python
Investigation of the relationship between speech preprocessing and transcription accuracy in the Google Cloud Speech API
Speech recognition of wav files with Google Cloud Speech API Beta
Try to determine food photos using Google Cloud Vision API
Let's publish the super resolution API using Google Cloud Platform
Print PDF using Google Cloud Print. (GoogleAPI)
I tried using docomo speech recognition API and Google Speech API in Java
Google Cloud Vision API sample for python
Try using Python with Google Cloud Functions
Use Google Cloud Vision API from Python
Image collection using Google Custom Search API
Creating Google Spreadsheet using Python / Google Data API
Transcribe WAV files with Cloud Speech API
How to display Map using Google Map API (Android)
Procedure to use TeamGant's WEB API (using python)
Transcription of YouTube videos using GCP's Cloud Speech-to-Text
How to use the Google Cloud Translation API
Until you can use the Google Speech API
[GCP] [Python] Deploy API serverless with Google Cloud Functions!
Upload JPG file using Google Drive API in Python
[Python] Get insight data using Google My Business API
How to analyze with Google Colaboratory using Kaggle API
I tried the Google Cloud Vision API for the first time
The story of creating a database using the Google Analytics API
Play with YouTube Data API v3 using Google API Python Client