[PYTHON] Convertir la voix en texte à l'aide du SDK Azure Speech

introduction

Convertissons la voix en texte à l'aide du SDK Azure Speech.

Environnement de développement

Reconnaître la voix du microphone

  1. Connectez-vous au portail Azure et créez un service vocal. image.png

  2. Accédez à la ressource que vous avez créée et faites une copie de la clé et de l'emplacement. image.png

    1. Créez un environnement Python 3.6.
conda create -n py36 python=3.6
conda activate py36

Quatre. Installez la bibliothèque.

pip install azure-cognitiveservices-speech

Cinq. Créez un programme.

C'est un programme qui affiche le résultat de la reconnaissance en entrant la voix une seule fois. Collez la clé que vous avez copiée précédemment dans «YourSubscriptionKey» et l'emplacement que vous avez copié précédemment dans «YourServiceRegion». Je veux reconnaître le japonais, alors réglez la langue sur "ja-JP".

import azure.cognitiveservices.speech as speechsdk

 Creates an instance of a speech config with specified subscription key and service region.
 Replace with your own subscription key and service region (e.g., "westus").
speech_key, service_region, language = "YourSubscriptionKey", "YourServiceRegion", "ja-JP"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

 Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Say something...")


 Starts speech recognition, and returns after a single utterance is recognized. The end of a
 single utterance is determined by listening for silence at the end or until a maximum of 15
 seconds of audio is processed.  The task returns the recognition text as result. 
 Note: Since recognize_once() returns only a single utterance, it is suitable only for single
 shot recognition like command or query. 
 For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

 Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

Il s'agit d'un programme qui saisit en permanence la voix et affiche le résultat de la reconnaissance. De même, veuillez définir la clé, l'emplacement et la langue.

import azure.cognitiveservices.speech as speechsdk
import time

 Creates an instance of a speech config with specified subscription key and service region.
 Replace with your own subscription key and service region (e.g., "westus").
speech_key, service_region, language = "YourSubscriptionKey", "YourServiceRegion", "ja-JP"
speech_config = speechsdk.SpeechConfig(
    subscription=speech_key, region=service_region, speech_recognition_language=language)

 Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Say something...")

def recognized(evt):
    print('「{}」'.format(evt.result.text))
    # do something

def start(evt):
    print('SESSION STARTED: {}'.format(evt))

def stop(evt):
    print('SESSION STOPPED {}'.format(evt))

speech_recognizer.recognized.connect(recognized)
speech_recognizer.session_started.connect(start)
speech_recognizer.session_stopped.connect(stop)

try:
    speech_recognizer.start_continuous_recognition()
    time.sleep(60)
except KeyboardInterrupt:
    print("bye.")
    speech_recognizer.recognized.disconnect_all()
    speech_recognizer.session_started.disconnect_all()
    speech_recognizer.session_stopped.disconnect_all()
  1. Exécutez la commande suivante et parlez-lui.
python stt.py

Le résultat de la reconnaissance s'affiche comme suit. image.png

Reconnaître la voix à partir d'un fichier vocal (.wav)

  1. La méthode d'installation est la même que ci-dessus.

  2. Créez un programme.

Un programme qui lit les fichiers .wav et affiche les résultats de la reconnaissance vocale. Définissez la clé et l'emplacement.

import azure.cognitiveservices.speech as speechsdk

 Creates an instance of a speech config with specified subscription key and service region.
 Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

 Creates an audio configuration that points to an audio file.
 Replace with your own audio filename.
audio_filename = "aboutSpeechSdk.wav"
audio_input = speechsdk.audio.AudioConfig(filename=audio_filename)

 Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)

print("Recognizing first result...")

 Starts speech recognition, and returns after a single utterance is recognized. The end of a
 single utterance is determined by listening for silence at the end or until a maximum of 15
 seconds of audio is processed.  The task returns the recognition text as result. 
 Note: Since recognize_once() returns only a single utterance, it is suitable only for single
 shot recognition like command or query. 
 For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

 Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

Pour les fichiers audio, utilisez sampledata \ audiofiles \ aboutSpeechSdk.wav dans cognitif-services-speech-sdk.

    1. Exécutez la commande suivante et voyez le résultat.
python stt_from_file.py

Si la clé et l'emplacement sont incorrects, vous obtiendrez l'erreur suivante.

(py36) C:\Users\good_\Documents\PythonProjects\AzureSpeech>python stt_from_file.py
Recognizing first result...
Speech Recognition canceled: CancellationReason.Error
Error details: Connection failed (no connection to the remote host). Internal error: 1. Error details: 11001. Please check network connection, firewall setting, and the region name used to create speech factory. SessionId: 77ad7686a9d94b7882398ae8b855d903

Le résultat est le suivant. image.png

Il a 52 secondes, mais il semble se terminer lorsqu'il reconnaît la première ligne.

Quatre. Pour lire en continu et reconnaître la voix, procédez comme suit.

import azure.cognitiveservices.speech as speechsdk
import time 

 Creates an instance of a speech config with specified subscription key and service region.
 Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
speech_key, service_region = "YourSubscriptionKey", "YourServiceRegion"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

 Creates an audio configuration that points to an audio file.
 Replace with your own audio filename.
audio_filename = "aboutSpeechSdk.wav"
audio_input = speechsdk.audio.AudioConfig(filename=audio_filename)

 Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)

print("Recognizing...")

def recognized(evt):
    print('「{}」'.format(evt.result.text))
    # do something

def start(evt):
    print('SESSION STARTED: {}'.format(evt))

def stop(evt):
    print('SESSION STOPPED {}'.format(evt))

speech_recognizer.recognized.connect(recognized)
speech_recognizer.session_started.connect(start)
speech_recognizer.session_stopped.connect(stop)

try:
    speech_recognizer.start_continuous_recognition()
    time.sleep(60)
except KeyboardInterrupt:
    print("bye.")
    speech_recognizer.recognized.disconnect_all()
    speech_recognizer.session_started.disconnect_all()
    speech_recognizer.session_stopped.disconnect_all()

Cinq. Essayons encore.

Il semble que la reconnaissance vocale soit possible en continu comme indiqué ci-dessous! image.png

Je vous remercie pour votre travail acharné.

référence

Recommended Posts

Convertir la voix en texte à l'aide du SDK Azure Speech
J'ai essayé d'utiliser Azure Speech to Text.
Parler avec Python [synthèse vocale]
Introduction à discord.py (3) Utilisation de la voix
J'ai essayé l'authentification vocale Watson (Speech to Text)
Convertissez un grand nombre de fichiers PDF en fichiers texte à l'aide de pdfminer
Reconnaissance vocale en anglais avec python [speech to text]
Authentification vocale et transcription avec Raspberry Pi 3 x Julius x Watson (Speech to Text)
J'ai essayé de classer le texte en utilisant TensorFlow
Convertir un PDF joint en courrier électronique au format texte
Convertir STL en maillage Voxel à l'aide de Python VTK
Convertir les données au format json en txt (en utilisant yolo)
Convertir en HSV
Convertir un fichier texte avec des valeurs hexadécimales en fichier binaire
[Python] Convertir le texte PDF en CSV pour chaque page (2/24 postscript)