Speech file recognition by Google Speech API v2 using Python

Use Google services to recognize voice files.

environment

· Python3.5 64bits by Anaconda ・ Win10 -The audio file is WAV. I just need to convert it with sox separately.

Packages to install

SpeechRecognition https://github.com/Uberi/speech_recognition It is a package that makes it easy to use various voice recognition cloud services. High functionality. pyaudio It seems necessary for Speech Recognition to work. google-api-python-client It is a package that is used when diverting the sample source of SpeechRecognition, so install it. pydub It is used to separate audio files in a silent section. pip install pydub is. FFMPEG I'm not sure why I have to install it, http://chachay.hatenablog.com/entry/2016/10/03/215841 I am doing as written in.

To use Google's Speech API v2

approximately, http://qiita.com/lethe2211/items/7c9b1b82c7eda40dafa9 I think that's right. It is troublesome that it does not come out unless you join ML.

important point

If the audio file is too long, I don't know what the cause is (as of January 11, 2017), but I get an error. In my environment, the result of an audio file of about 10 seconds is returned, but when it reaches 20 seconds, an error occurs.

When using the Speech Recognition sample, Try adjusting the time with duration like ʻaudio = r.record (source, duration = 10`) and check the result. If it is long, you will get an error, right?

File division in silent part

fundamentally, http://chachay.hatenablog.com/entry/2016/10/03/215841 It is as follows.

When trying to recognize voice with Google Speech API v2, if the file is large, an error will occur (I don't know what the cause is), so this is an attempt to divide the file into silent parts for recognition.

Source when performing voice recognition

I use various libraries and perform data conversion between them via wav file, so I think there is a lot of waste, but I will post the source.

Import, I think there is something useless, so please omit it as appropriate.

import speech_recognition as sr
from os import path
from googleapiclient import discovery
import httplib2
import base64, json
import urllib
import os
from pydub import AudioSegment
from pydub.silence import split_on_silence


if __name__ == '__main__':
    r = sr.Recognizer()
    audio_data = []
    sound = AudioSegment.from_file('./filename.wav', format='wav')
    chunks = split_on_silence(sound, min_silence_len=1500, silence_thresh=-30, keep_silence=500)
    
    for chunk in chunks:
        chunk.export('./temp.wav', format='wav')
        AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "temp.wav") 

        with sr.AudioFile(AUDIO_FILE) as source:
            audio = r.record(source)
            audio_data.append(audio)
    for audio in audio_data:
        try:
            print("Google Speech Recognition thinks you said " + r.recognize_google(audio,key='your API key', language='ja'))
        except sr.UnknownValueError:
            print("Google Speech Recognition could not understand audio")
        except sr.RequestError as e:
            print("Could not request results from Google Speech Recognition service; {0}".format(e))

Reference site

http://chachay.hatenablog.com/entry/2016/10/03/215841 https://pypi.python.org/pypi/SpeechRecognition/

Recommended Posts

Speech file recognition by Google Speech API v2 using Python
Stream speech recognition using Google Cloud Speech gRPC API on python3 on Mac!
Speech recognition by Python MFCC
Upload JPG file using Google Drive API in Python
Speech transcription procedure using Python and Google Cloud Speech API
Play with YouTube Data API v3 using Google API Python Client
Streaming speech recognition with Google Cloud Speech API
Creating Google Spreadsheet using Python / Google Data API
I tried using docomo speech recognition API and Google Speech API in Java
Run Google Analytics API (core v3) in python
Speech recognition in Python
[Python] Get insight data using Google My Business API
EXE Web API by Python
Age recognition using Pepper's API
Google Drive Api Tips (Python)
Inflating text data by retranslation using google translate in Python
Image collection by calling Bing Image Search API v5 from Python
Python: Extract file information from shared drive with Google Drive API
Data acquisition using python googlemap api
[Python] Hit the Google Translation API
Try using Pleasant's API (python / FastAPI)
OS determination by Makefile using Python
Extract the targz file using python
Facial expression recognition using Pepper's API
Regularly upload files to Google Drive using the Google Drive API in Python
Run Ansible from Python using API
Handwriting recognition using KNN in Python
[Python] File operation using if statement
[SEO] Flow / sample code when using Google Analytics API in Python
Use Google Analytics API from Python
[Python] I tried using YOLO v3
Google Cloud Speech API vs. Amazon Transcribe
Google Cloud Vision API sample for python
Python: Basics of image recognition using CNN
Mouse operation using Windows API in Python
Python> dictionary> values ()> Get All Values by Using values ()
Category estimation using docomo's image recognition API
English speech recognition with python [speech to text]
Try using the Wunderlist API in Python
Try using the Kraken API in Python
[Beginner] Python web scraping using Google Colaboratory
I tried using YOUTUBE Data API V3
Python: Application of image recognition using CNN
Get Google Fit API data in Python
Try using Python with Google Cloud Functions
Get Youtube data in Python using Youtube Data API
Use Google Cloud Vision API from Python
I tried using UnityCloudBuild API from Python
Read the file line by line in Python
[Python] Progress display by progress bar using tqdm
Image collection using Google Custom Search API
Serverless face recognition API made with Python
Implemented Python wrapper for Qiita API v2
Scene recognition by GIST features in Python
Add conversation function to slack bot (made by python) using Recruit's Talk API
Get a list of articles posted by users with Python 3 Qiita API v2
[Python] Automatically totals the total number of articles posted by Qiita using the API
How to display Map using Google Map API (Android)
Create a GIF file using Pillow in Python
Speech synthesis and speech recognition by Microsoft Project Oxford
[Python] Split a large Flask file using Blueprint