That's why I will describe the actual verification of speaker identification using the "Speaker Recognition API". (Please let me know if there is something strange!)
The following three steps are required to identify the speaker.
So this time, I would like to create three processes for each step so that I can easily understand them.
First, create the user you want to identify the speaker. As an API function, use "Create Profile" of "Identification Profile". It creates a profile for the user and returns the user's profile ID. (Since the name is not registered, it is necessary to manage the list separately)
In the verification script, the user name is specified as an argument and the user name and ID are linked to the file "Profile_List.csv" and output.
CreateProfile.py
########### module #############
import sys # Library for storing arguments import requests # Library for HTTP communication import json # Library for using data in json format import base64 import csv
########### Args & variable #########################
args = sys.argv
Profile_Name = args[1]
Profile_List = 'Profile_List.csv'
########### Create Profile #########################
with open(Profile_List) as fp:
lst = list(csv.reader(fp))
for i in lst:
if Profile_Name in i:
print ('The specified user is already registered.') sys.exit()
ApiPath = 'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/identificationProfiles'
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key':'
body = {
'locale':'en-us',
}
r = requests.post(
ApiPath, # URL
headers = headers, # headers json = body # body )
try:
ProfileId = r.json()['identificationProfileId']
except Exception:
print('Error:{}'.format(r.status_code))
print(r.json()['error'])
sys.exit()
print(ProfileId)
f = open(Profile_List, 'a')
writer = csv.writer(f, lineterminator='\n')
writer.writerow([Profile_Name, ProfileId])
####################################
We will register the voice to the user created above. (Unlike speaker authentication, no phrase is specified, so anything is OK.)
The following functions are used here.
I'm also crazy about it personally, but there are some pretty strict restrictions on the audio files available.
Property | Required value |
---|---|
container | WAV |
Encode | PCM |
rate | 16K |
Sample format | 16 bit |
channel | monaural |
I couldn't get the sound that met the conditions, but I managed to record it with free software called "Audacity". (This is very convenient)
The argument of the script is the user name. (It is assumed that the audio file has a user name, but it is good to verify it.)
CreateEnrollment.py
########### module #############
import sys # Library for storing arguments import requests # Library for HTTP communication import json # Library for using data in json format import base64 import csv import time
########### Args & variable #########################
args = sys.argv
Profile_Name = args[1]
Profile_List = 'Profile_List.csv'
WavFile = f'{Profile_Name}.wav'
with open(Profile_List) as fp:
lst = list(csv.reader(fp))
for i in lst:
if Profile_Name in i:
break
j = lst.index(i)
ProfileId = lst[j][1]
########### Create Enrollment #########################
ApiPath = f'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/identificationProfiles/{ProfileId}/enroll?shortAudio=true'
headers = {
# Request headers
'Content-Type': 'application/octet-stream',
'Ocp-Apim-Subscription-Key':'
with open(WavFile, 'rb') as f:
body = f.read()
r = requests.post(
ApiPath, # URL
headers = headers, # headers data = body # body )
try:
response = r
print('response:', response.status_code)
if response.status_code == 202:
print(response.headers['Operation-Location'])
operation_url = response.headers['Operation-Location']
else:
print(response.json()['error'])
sys.exit()
except Exception:
print(r.json()['error'])
sys.exit()
####################################
########### Get Operation Status #########################
url = operation_url
headers = {
# Request headers
'Ocp-Apim-Subscription-Key':'
status = ''
while status != 'succeeded':
r = requests.get(
url, # URL
headers = headers, # headers )
try:
response = r
print('response:', response.status_code)
if response.status_code == 200:
status = response.json()['status']
print (f'current status; {status}') if status == 'failed': message = response.json()['message'] print(f'error:{message}') sys.exit() elif status != 'succeeded': time.sleep(3) else: print(r.json()['error']) sys.exit() except Exception: print(r.json()['error']) sys.exit()
enrollmentStatus = response.json()['processingResult']['enrollmentStatus']
remainingEnrollmentSpeechTime = response.json()['processingResult']['remainingEnrollmentSpeechTime']
speechTime = response.json()['processingResult']['speechTime']
if enrollmentStatus == 'enrolling':
status ='Profile is currently being registered and is not ready for identification. ' elif enrollmentStatus == 'training': status ='Profile is currently being trained and is not ready for identification. ' else: status ='The profile is currently being registered and ready for identification. '
print (f'\ n status; {enrollmentStatus}') print (f'current status; {status}') print (f'total valid audio time (seconds): {speechTime}') print (f'Remaining audio time (seconds) required for successful registration: {remainingEnrollmentSpeechTime}')
It's finally the main process. The following functions are used here.
In this verification, the audio file whose arguments you want to identify is used. By the way, regarding speaker identification, it seems that up to 10 users (profiles) can be verified at the same time so far. As a process, POST the voice and profile ID (plural) that you want to identify with "Identification", execute "Get Operation Status" for the returned URL ʻOperation-Location`, and check the identification status and result. The image to get. {In the verification, it took up to 9 seconds to complete the identification) Also, since the "profile ID" is returned as the identification result, it is necessary to replace it with the user name separately. The reliability of identification is also returned, but it seems that there are three levels: low, medium, and high.
Identification.py
########### module #############
import sys # Library for storing arguments import requests # Library for HTTP communication import json # Library for using data in json format import base64 import csv import time
########### Args & variable #########################
args = sys.argv
WavFile = args[1]
Profile_List = 'Profile_List.csv'
with open(Profile_List) as fp:
lst = list(csv.reader(fp))
########### Identification #########################
ProfileIds = ''
for a, b in lst:
ProfileIds += b + ','
ProfileIds = ProfileIds[:-1]
url = 'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/identify'
params = {
'identificationProfileIds': ProfileIds,
'shortAudio': True,
}
headers = {
# Request headers
'Content-Type': 'application/octet-stream',
'Ocp-Apim-Subscription-Key':'
with open(WavFile, 'rb') as f:
body = f.read()
r = requests.post(
url, # URL
params = params,
headers = headers, # headers data = body # body )
try:
response = r
print('response:', response.status_code)
if response.status_code == 202:
print(response.headers['Operation-Location'])
operation_url = response.headers['Operation-Location']
else:
print(response.json()['error'])
sys.exit()
except Exception:
print(r.json()['error'])
sys.exit()
####################################
########### Get Operation Status #########################
url = operation_url
#url = 'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/operations/ea1edc22-32f4-4fb9-81d6-d597a0072c76'
headers = {
# Request headers
'Ocp-Apim-Subscription-Key':'
status = ''
while status != 'succeeded':
r = requests.get(
url, # URL
headers = headers, # headers )
try:
response = r
print('response:', response.status_code)
if response.status_code == 200:
status = response.json()['status']
print (f'current status; {status}') if status == 'failed': message = response.json()['message'] print(f'error:{message}') sys.exit() elif status != 'succeeded': time.sleep(3) else: print(r.json()['error']) sys.exit() except Exception: print(r.json()['error']) sys.exit()
identifiedProfileId = response.json()['processingResult']['identifiedProfileId']
confidence = response.json()['processingResult']['confidence']
for i in lst:
if identifiedProfileId in i:
break
j = lst.index(i)
Profile_Name = lst[j][0]
print (f'\ n speaker; {Profile_Name}') print (f'reliability; {confidence}') ####################################
So, this time I tried to verify the "Speaker Recognition API". It was said that it was not compatible with Japanese, but I personally felt that speaker identification was quite accurate. It seems that you can do various things if you use it well!
Recommended Posts