This section describes how to use the speech recognition API on MacOS Catalina (ver. 10.15.4). I tried to recognize Japanese using Speech to Text of Cognitive Services. Since the shell is zsh, bash may not be able to do it.
First of all, please create an Azure account as a prerequisite. You can create an account for free. It is recommended because it comes with a 200 $ deposit and you can use various APIs for free for one year.
--Once you have created an account, click ** Create Resource ** on the portal site. --Searching for ** Speech ** in the search bar will bring up the API options ** Voice ** or ** Speech **. --Click the Speech choice, then click ** Create **. --The creation form screen will appear, and the following items will appear. --Name: Name of the resource (anything is fine) --Subscription: Free Trial (displayed by default) --Location: East Japan (if you specify a region of Japan) --Price level: F0 --Resource group: Click ** New ** to decide the resource name. Anything will be fine. --Once the resource has been created, the created resource should be reflected in the dashboard, so click it. --Then, there is an item called ** Key Management ** in the overview, so click it. The resource name, endpoint, and ** two subscription keys ** are written there. Remember that you will use your subscription key later.
** Make sure your subscription key is never seen by others. ** ** This is the voice recognition instance creation.
Next, make the settings on the PC. First, install the Speech SDK.
.zsh
python3 -m pip install --upgrade pip
pip install azure-cognitiveservices-speech
Next, since the sample code for voice recognition prepared by MicroSoft is on GIT, create a quickstart.py file locally and copy and paste it. Since git has ** quickstart.py **, code for jupyter (Quickstart.ipynb) and README.md, please copy the contents of ** quickstart.py **. (The code is here) A code like this is written. If you copy it, there is one place to change and one additional note.
quickstart.py
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
# <code>
import azure.cognitiveservices.speech as speechsdk
# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and service region (e.g., "westus").
'''
The following changes
Subscription key: One of the two keys you can see from the resource overview you just checked
Location: In eastern Japan'japaneast',In western Japan'japanwest'Please.
'''
speech_key, service_region = "Subscription key", "place"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
'''
Addendum below
Settings for recognizing Japanese. Without this, only English is recognized by default.
'''
speech_config.speech_recognition_language="ja-JP"
# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
print("Say something...")
# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed. The task returns the recognition text as result.
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query.
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()
# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
# </code>
Now that you're ready, run the following from your terminal:
.zsh
python quickstart.py
In my case, when I run it via VScode, the audio is not recognized, so if that happens, run it in the terminal. If you know how to do it with VScode and how to set it, please let me know. When you run
say something...
Is displayed, so please say something. The recognition result should be output. Due to the setting, only one word is recognized, but it can be changed to recognize the sequence.
That's it.
Recommended Posts