I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2

Preface

That's why I will describe the actual verification of speaker identification using the "Speaker Recognition API". (Please let me know if there is something strange!)

Process flow

The following three steps are required to identify the speaker.

Create a user profile
Register the voice in the user's profile
Identify who said based on the registered voice

So this time, I would like to create three processes for each step so that I can easily understand them.

Step 1 Create a user profile

First, create the user you want to identify the speaker. As an API function, use "Create Profile" of "Identification Profile". It creates a profile for the user and returns the user's profile ID. (Since the name is not registered, it is necessary to manage the list separately)

In the verification script, the user name is specified as an argument and the user name and ID are linked to the file "Profile_List.csv" and output.

CreateProfile.py

########### module #############

import sys # Library for storing arguments import requests # Library for HTTP communication import json # Library for using data in json format import base64 import csv

########### Args & variable #########################
args = sys.argv
Profile_Name = args[1]
Profile_List = 'Profile_List.csv'

########### Create Profile #########################
with open(Profile_List) as fp:
    lst = list(csv.reader(fp))

for i in lst:
    if Profile_Name in i:

print ('The specified user is already registered.') sys.exit()

ApiPath = 'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/identificationProfiles'

headers = {
    # Request headers
    'Content-Type': 'application/json',

'Ocp-Apim-Subscription-Key':'', }

body = {
    'locale':'en-us',
}

r = requests.post(
    ApiPath,            # URL

headers = headers, # headers json = body # body )

try:
    ProfileId = r.json()['identificationProfileId']
except Exception:
    print('Error:{}'.format(r.status_code))
    print(r.json()['error'])
    sys.exit()

print(ProfileId)

f = open(Profile_List, 'a')
writer = csv.writer(f, lineterminator='\n')
writer.writerow([Profile_Name, ProfileId])
####################################

Step 2 Register the voice in the user's profile

We will register the voice to the user created above. (Unlike speaker authentication, no phrase is specified, so anything is OK.)

The following functions are used here.

"Create Enrollment" of "Identification Profile" (voice registration)
"Get Operation Status" of "Speaker Recognition" (confirmation of registration status)

I'm also crazy about it personally, but there are some pretty strict restrictions on the audio files available.

Property	Required value
container	WAV
Encode	PCM
rate	16K
Sample format	16 bit
channel	monaural

I couldn't get the sound that met the conditions, but I managed to record it with free software called "Audacity". (This is very convenient)

The argument of the script is the user name. (It is assumed that the audio file has a user name, but it is good to verify it.)

CreateEnrollment.py

########### module #############

import sys # Library for storing arguments import requests # Library for HTTP communication import json # Library for using data in json format import base64 import csv import time

########### Args & variable #########################
args = sys.argv
Profile_Name = args[1]
Profile_List = 'Profile_List.csv'
WavFile = f'{Profile_Name}.wav'

with open(Profile_List) as fp:
    lst = list(csv.reader(fp))

for i in lst:
    if Profile_Name in i:
        break

j = lst.index(i)
ProfileId = lst[j][1]

########### Create Enrollment #########################
ApiPath = f'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/identificationProfiles/{ProfileId}/enroll?shortAudio=true'

headers = {
    # Request headers
    'Content-Type': 'application/octet-stream',

'Ocp-Apim-Subscription-Key':'', }

with open(WavFile, 'rb') as f:
    body = f.read()

r = requests.post(
    ApiPath,            # URL

headers = headers, # headers data = body # body )

try:
    response = r
    print('response:', response.status_code)
    if response.status_code == 202:
        print(response.headers['Operation-Location'])
        operation_url = response.headers['Operation-Location']
    else:
        print(response.json()['error'])
        sys.exit()
except Exception:
    print(r.json()['error'])
    sys.exit()
####################################
########### Get Operation Status #########################
url = operation_url

headers = {
    # Request headers

'Ocp-Apim-Subscription-Key':'', }

status = ''
while status != 'succeeded':
    
    r = requests.get(
        url,            # URL

headers = headers, # headers )

    try:
        response = r
        print('response:', response.status_code)
        if response.status_code == 200:
            status = response.json()['status']

print (f'current status; {status}') if status == 'failed': message = response.json()['message'] print(f'error:{message}') sys.exit() elif status != 'succeeded': time.sleep(3) else: print(r.json()['error']) sys.exit() except Exception: print(r.json()['error']) sys.exit()

enrollmentStatus = response.json()['processingResult']['enrollmentStatus']
remainingEnrollmentSpeechTime = response.json()['processingResult']['remainingEnrollmentSpeechTime']
speechTime = response.json()['processingResult']['speechTime']

if enrollmentStatus == 'enrolling':

status ='Profile is currently being registered and is not ready for identification. ' elif enrollmentStatus == 'training': status ='Profile is currently being trained and is not ready for identification. ' else: status ='The profile is currently being registered and ready for identification. '

print (f'\ n status; {enrollmentStatus}') print (f'current status; {status}') print (f'total valid audio time (seconds): {speechTime}') print (f'Remaining audio time (seconds) required for successful registration: {remainingEnrollmentSpeechTime}')

Step 3 Identify who said based on the registered voice

It's finally the main process. The following functions are used here.

"Identification" of "Speaker Recognition"
"Get Operation Status" of "Speaker Recognition"

In this verification, the audio file whose arguments you want to identify is used. By the way, regarding speaker identification, it seems that up to 10 users (profiles) can be verified at the same time so far. As a process, POST the voice and profile ID (plural) that you want to identify with "Identification", execute "Get Operation Status" for the returned URL ʻOperation-Location`, and check the identification status and result. The image to get. {In the verification, it took up to 9 seconds to complete the identification) Also, since the "profile ID" is returned as the identification result, it is necessary to replace it with the user name separately. The reliability of identification is also returned, but it seems that there are three levels: low, medium, and high.

Identification.py

########### module #############

import sys # Library for storing arguments import requests # Library for HTTP communication import json # Library for using data in json format import base64 import csv import time

########### Args & variable #########################
args = sys.argv
WavFile = args[1]
Profile_List = 'Profile_List.csv'

with open(Profile_List) as fp:
    lst = list(csv.reader(fp))

########### Identification #########################
ProfileIds = ''
for a, b in lst:
    ProfileIds += b + ','

ProfileIds = ProfileIds[:-1]

url = 'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/identify'

params = {
    'identificationProfileIds': ProfileIds,
    'shortAudio': True,
}

headers = {
    # Request headers
    'Content-Type': 'application/octet-stream',

'Ocp-Apim-Subscription-Key':'', }

with open(WavFile, 'rb') as f:
    body = f.read()

r = requests.post(
    url,            # URL 
    params = params,

headers = headers, # headers data = body # body )

try:
    response = r
    print('response:', response.status_code)
    if response.status_code == 202:
        print(response.headers['Operation-Location'])
        operation_url = response.headers['Operation-Location']
    else:
        print(response.json()['error'])
        sys.exit()
except Exception:
    print(r.json()['error'])
    sys.exit()

####################################
########### Get Operation Status #########################
url = operation_url
#url = 'https://speaker-recognitionapi.cognitiveservices.azure.com/spid/v1.0/operations/ea1edc22-32f4-4fb9-81d6-d597a0072c76'

headers = {
    # Request headers

'Ocp-Apim-Subscription-Key':'', }

status = ''
while status != 'succeeded':
    
    r = requests.get(
        url,            # URL

headers = headers, # headers )

    try:
        response = r
        print('response:', response.status_code)
        if response.status_code == 200:
            status = response.json()['status']

identifiedProfileId = response.json()['processingResult']['identifiedProfileId']
confidence = response.json()['processingResult']['confidence']

for i in lst:
    if identifiedProfileId in i:
        break

j = lst.index(i)
Profile_Name = lst[j][0]

print (f'\ n speaker; {Profile_Name}') print (f'reliability; {confidence}') ####################################

end

So, this time I tried to verify the "Speaker Recognition API". It was said that it was not compatible with Japanese, but I personally felt that speaker identification was quite accurate. It seems that you can do various things if you use it well!

I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services in Python. # 1

I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2

Preface

Process flow

Step 1 Create a user profile

Step 2 Register the voice in the user's profile

Step 3 Identify who said based on the registered voice

end

Previous article