[PYTHON] Transcribe WAV files with Cloud Speech API

Purpose

** How to transcribe WAV file voice with Google Cloud Speech-to-Text API **. [Article on how to transcribe FLAC file](https://qiita.com/knyrc/items/ 7aab521edfc9bfb06625) was used as a reference to transcribe the WAV file. With this method, you can transcribe ** without converting to FLAC format.

Click here for code

[Important] Preparation of WAV file

** Since the Cloud Speech-to-Text API obtains the information required for transcription from the header information of the WAV file **, it is necessary to confirm in advance whether the header of the WAV file to be voice-converted is normal. The information to be confirmed in the header information is ** whether it is PCM (fmt_wave_format_type) and sampling frequency (fmt_samples_per_sec) **.

If you want to check the specifications of Cloud Speech-to-Text API, go to VS Code [Recognition Config](https://cloud.google.com/speech-to-text/docs/reference/rpc/google.cloud.speech.v1 Please jump to the definition source (# google.cloud.speech.v1.RecognitionConfig).

Check WAV file header

Check the header information by running the program written in Article on reading header information of WAVE file with Python.

Normal WAV file

--fmt_samples_per_sec: 8000-48000 (16000 is the best) --fmt_wave_format_type: 1 (points to PCM)

Bad WAV file example

If the WAV file is in a bad format

Refer to here and ** Export WAV file using Mac's default "Music" app ** It worked!

** [Caution] WAV files exported with iMovie and WAV files edited with QuickTime Player could not be moved because the headers are not normal! ** **

Creating a service account key

Basically, please refer to Article on how to transcribe FLAC file and create a ** json key **.

** [Caution] This time, the WAV file uploaded to Google Cloud Storage will be transcribed, so it is necessary to grant Cloud Storage access to the service account. **

Add a Storage Object Viewer to your role. スクリーンショット 2020-10-22 0.28.11.png

If you use a service account that you don't have Cloud Storage access to, you should get angry:

PermissionDenied: 403 hogehoge does not have storage.objects.get access to the Google Cloud Storage object.

Set the service account key path in an environment variable

Set the path of the json file you downloaded earlier to an environment variable.

export GOOGLE_APPLICATION_CREDENTIALS=./hoge.json

Upload WAV files to Cloud Storage

Please refer to Article on how to transcribe FLAC file and upload the WAV file to Cloud Storage. If you look at the object details screen, gs You can see the file path to the resource in Cloud Storage starting with.

Transcription script

I created it by referring to Article on how to transcribe FLAC files.

transcribe.py


# # !/usr/bin/env python
# coding: utf-8
import argparse
import datetime


def transcribe(gcs_uri):
    from google.cloud import speech_v1 as speech
    from google.cloud.speech_v1 import types
    client = speech.SpeechClient()
    audio = types.RecognitionAudio(uri=gcs_uri)
    #Since it is written in the header of the audio file, it is not necessary to specify the sampling frequency.
    config = types.RecognitionConfig(language_code='ja-JP')
    operation = client.long_running_recognize(config, audio)

    operationResult = operation.result()
    now = datetime.datetime.now()
    print('Waiting for operation to complete...')

    with open('./{}.txt'.format(now.strftime("%Y%m%d-%H%M%S")), mode='w') as f:
        for result in operationResult.results:
            print("Transcript: {}".format(result.alternatives[0].transcript))
            print("Confidence: {}".format(result.alternatives[0].confidence))
            f.write('{}\n'.format(result.alternatives[0].transcript))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'path', help='cloud storage path start with gs://')
    args = parser.parse_args()
    transcribe(args.path)

Execution of transcription script

Specify the file path to the resource in Cloud Storage starting with gs: // as an argument, and execute the script.

python transcribe.py gs://hogehoge.wav

result

The result comes out as standard output and a text file.

Transcript:If you can register
Confidence: 0.8765763640403748
Transcript:I think it's better to be there
Confidence: 0.8419854640960693

20201022-010101.txt


If you can register
I think it's better to be there

reference

-Article on how to transcribe FLAC files -Character conversion of long audio files

Recommended Posts

Transcribe WAV files with Cloud Speech API
Speech recognition of wav files with Google Cloud Speech API Beta
Google Cloud Speech API vs. Amazon Transcribe
Streaming speech recognition with Google Cloud Speech API
Automatic voice transcription with Google Cloud Speech API
[Python] POST wav files with requests [POST]
Speech transcription procedure using Google Cloud Speech API
Problems with output results with Google's Cloud Vision API
Text extraction with GCP Cloud Vision API (Python3.6)
Read wav files with only Python standard packages
Speech transcription procedure using Python and Google Cloud Speech API
Upload files with Django
Extrude with Fusion360 API
Point Cloud with Pepper
Easy to use Nifty Cloud API with botocore and python
Flow of extracting text in PDF with Cloud Vision API
How to upload files to Cloud Storage with Firebase's python SDK
Upload and delete files to Google Cloud Storages with django-storage