Operate the Speech Signal Processing Toolkit via python

Speech Signal Processing Toolkit (SPTK) is a C language library that can perform speech analysis, speech synthesis, vector quantization, data processing, and so on. I thought it could be used for signal processing such as vibration, so I decided to give it a try.

This time, the contents that we want to realize using SPTK are as follows.

As a python module, there is a high-performance signal processing tool called librosa and a wrapper for SPTK published by volunteers called pysptk, but I want to use SPTK. It didn't seem to support the command, so I had to work on it.

In addition, since I have no knowledge of signal processing (programming is also suspicious), there may be mistakes in terms. Please understand that it is not bad.

1. Introduction of SPTK

For windows

I referred to the following HP.

Build with VisualStudio2019 x64 Native Tools. It was easier to install than I expected, but in my environment I had a problem with building "pitch.exe". So, I avoided it by forcibly deleting all the descriptions related to "pitch.exe" in the bin / Makefile.mak file before building.

for ubuntu

I referred to the following HP.

I can install it with ʻapt, but SPTK that can be installed with ʻapt seems to have limited optional features with some commands (this may be a problem in my environment). I think it's better to build from the source file obediently because there is a possibility that you will be addicted to extra things when using commands.

$ tar xvzf SPTK-3.11.tar.gz
$ cd SPTK-3.11
$ ./configure
$ make
$ sudo make install

1. How to use SPTK

First, I learned how to use SPTK. There is a wonderful HP that can be helpful. It was a great learning experience for me because he gave me a very detailed explanation. Thank you very much.

SPTK command operation

SPTK is basically like a tool that operates using commands via the console. Here, create sin wave data with the command sin of SPTK and save it with the file name sin.data.

Open a console and enter the following command. A sine wave byte string with period 16 and length 48 (3 cycles) is saved with the file name sin.data.

$ sin -l 48 -p 16 > sin.data

To check the contents of the file, enter the SPTK command as follows:

$ x2x +f < sin.data | dmp +f

The result is output as shown below, and you can check the contents of the file. The number on the left is the index number. Keep in mind that the index numbers are automatically added for display and the actual data file contains only the numbers (on the right).

0       0
1       0.382683
2       0.707107
3       0.92388
4       1
5       0.92388
…

In addition, it seems that text data can also be read. In that case, prepare a text data file (sin.txt in the example below) in which the numerical values are separated by spaces (space separate value?), And read it with the following command.

$ x2x +af < sin.txt | dmp +f

When reading text data, the option must correspond to ʻASCII, such as + af`. (Because I didn't understand such basic specifications, I couldn't get the analysis result I expected, and I wasted about half a day ...)

Reading data in python

Now, let's read the byte string data sin.data saved earlier with python.

import numpy as np

with open('sin.data', mode='rb') as f:
    data = np.frombuffer(f.read(), dtype='float32')
    print(data)

result

[ 0.0000000e+00  3.8268343e-01  7.0710677e-01  9.2387950e-01
  1.0000000e+00  9.2387950e-01  7.0710677e-01  3.8268343e-01
  1.2246469e-16 -3.8268343e-01 -7.0710677e-01 -9.2387950e-01
 -1.0000000e+00 -9.2387950e-01 -7.0710677e-01 -3.8268343e-01
 -2.4492937e-16  3.8268343e-01  7.0710677e-01  9.2387950e-01
  1.0000000e+00  9.2387950e-01  7.0710677e-01  3.8268343e-01
  3.6739403e-16 -3.8268343e-01 -7.0710677e-01 -9.2387950e-01
 -1.0000000e+00 -9.2387950e-01 -7.0710677e-01 -3.8268343e-01
 -4.8985874e-16  3.8268343e-01  7.0710677e-01  9.2387950e-01
  1.0000000e+00  9.2387950e-01  7.0710677e-01  3.8268343e-01
  6.1232340e-16 -3.8268343e-01 -7.0710677e-01 -9.2387950e-01
 -1.0000000e+00 -9.2387950e-01 -7.0710677e-01 -3.8268343e-01]

Creating data in python

Next, let's create byte string data to be passed to SPTK with python. It is quite important to specify the type. (I was addicted here too)

arr = np.array(range(0,5)) #Make a sequence appropriately

with open('test.data', mode='wb') as f:
    arr = arr.astype(np.float32) #Make float32 type
    barr = bytearray(arr.tobytes()) #to bytarray
    f.write(barr)

Read the file with SPTK and check it.

$ x2x +f < test.data | dmp +f
0       0
1       1
2       2
3       3
4       4

Cooperation between python and SPTK

If you save the numpy.ndarray created by python in this way to a file as a byte string and pass the file via a command, it seems that you can process the data with SPTK. Let's try using sin.data for a moment.

import subprocess

#Command to read data and apply window function
cmd = 'x2x +f < sin.data | window -l 16'

p = subprocess.check_output(cmd, shell = True)
out = np.frombuffer(p, dtype='float32')
print(out)
[-0.0000000e+00  3.0001572e-03  2.5496081e-02  8.6776853e-02
  1.8433140e-01  2.7229854e-01  2.8093100e-01  1.7583697e-01
  5.6270582e-17 -1.5203877e-01 -2.0840828e-01 -1.7030001e-01
 -9.3926586e-02 -3.3312235e-02 -5.5435672e-03  2.4845590e-18
  1.5901955e-33  3.0001572e-03  2.5496081e-02  8.6776853e-02
  1.8433140e-01  2.7229854e-01  2.8093100e-01  1.7583697e-01
  1.6881173e-16 -1.5203877e-01 -2.0840828e-01 -1.7030001e-01
 -9.3926586e-02 -3.3312235e-02 -5.5435672e-03  2.4845590e-18
  3.1803911e-33  3.0001572e-03  2.5496081e-02  8.6776853e-02
  1.8433140e-01  2.7229854e-01  2.8093100e-01  1.7583697e-01
  2.8135290e-16 -1.5203877e-01 -2.0840828e-01 -1.7030001e-01
 -9.3926586e-02 -3.3312235e-02 -5.5435672e-03  2.4845590e-18]

More efficient data transfer

I was lamenting how wasteful it was to create a file just to pass data to SPTK, but there is something useful called ʻio.BytesIO`.

In the end, I prepared something like this.


import io
import shlex, subprocess
from typing import List

import numpy

def sptk_wrap(in_array : numpy.ndarray, sptk_cmd : str) -> numpy.ndarray:
    '''
input
        in_array :Waveform data
        sptk_cmd :sptk commands (eg'window -l 16')
output
Data after analysis
    '''
    # numpy.Convert ndarray to bytearray
    arr = in_array.astype(np.float32)
    barr = bytearray(arr.tobytes())
    bio = io.BytesIO(barr)
    
    #sptk command
    cmd = shlex.split(sptk_cmd)
    proc = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    out, err = proc.communicate(input=bio.read())
    
    return np.frombuffer(out, dtype='float32')
   
    
def sptk_wrap_pipe(in_array : numpy.ndarray, sptk_cmd_pipe : List[str]) -> numpy.ndarray:
    '''
input
        in_array :Waveform data
        sptk_cmd_pipe :Sptk commands stored in a list in the order you want to pipe
(Example)
        cmd_list = [
            'window -l 512 -L 512 -w 2',
            'spec -l 512 -o 0',
           ]
output
Data after analysis
    '''
    out_array = numpy.copy(in_array)
    for l in sptk_cmd_pipe:
        out_array = sptk_wrap(out_array, l)
        
    return out_array


#Spectrum analysis example
def ndarr2sp_ndarr(in_array : numpy.ndarray, length : int, wo : int = 2, oo : int = 0) -> numpy.ndarray:
    '''
input:Waveform data
output:Log power spectrum
    
option:
    wo :Window function options (0:blackman 1:hammin 2:hanning 3:barlett)
    oo :Output spectrum form (0: 20 × log |Xk| )

sptk command example
    window -l 512 -L 512 -w 2 | spec -l 512 -o 0
    '''
    cmd_list = [
        "window -l {0} -L {0} -w {1} ".format(length, wo),
        "spec -l {0} -o {1}".format(length, oo),
    ]

    return sptk_wrap_pipe(in_array, cmd_list)

2. Example of use

Create appropriate waveform data and actually analyze it. Here, 10 sets of samples with a data length of 512 were created while changing the frequency of the data to be created.

import numpy as np
import matplotlib.pyplot as plt

N = 2**9            #Number of waveform samples to analyze 512
dt = 0.01          #Sampling interval
t = np.arange(0, N*dt, dt) #Time axis
freq = np.linspace(0, 1.0/dt, N) #Frequency axis

samples = []
for f in range(1,11):
    #Set the frequency of the waveform to be created to 1~Create 10 sets of waveform samples while changing to 10.
    wave = np.sin(2*np.pi*f*t)
    samples.append(wave)
    
samples = np.asarray(samples)
print(samples.shape)

Output: (10, 512)

When you plot the created data, it looks like this.

1st data (frequency 1Hz)

plt.plot(t, samples[0])

index1.png

10th data (frequency 10Hz)

plt.plot(t, samples[9])

index2.png

Now, let's analyze the spectrum of the 10th data using SPTK.

ps = ndarr2sp_ndarr(samples[9], N)

plt.plot(freq[:N//2+1], ps)
plt.xlabel("frequency [Hz]")
plt.ylabel("Logarithmic Power Spectrum [dB]")

index3.png

You can also analyze multiple data at once. However, the result is output in a flatly connected state, so reshaping is required.

First, check the shape of the dataset.

samples_shape = samples.shape
print(samples_shape)

Output: (10, 512)

Analyze 10 pieces together with SPTK.

ps_s = ndarr2sp_ndarr(samples, N)
print(ps_s.shape)

Output: (2570,)

Reshape.

ps_s = ps_s.reshape((samples_shape[0],-1))
print(ps_s.shape)

Output: (10, 257)

10th data (frequency 10Hz)

print(np.max(ps_s[9]))
plt.plot(freq[:N//2+1], ps_s[9])
plt.xlabel("frequency [Hz]")
plt.ylabel("Logarithmic Power Spectrum [dB]")

Output: 19.078928 index4.png

3. 3. Supplement

I compared it with the result of my own analysis. I tried normalizing with the number of data and multiplying by the correction value of the window function, but the decibel value is slightly different from the result analyzed by SPTK.

I don't know the reason ... It's likely that you're doing something stupid. (Please tell me who is familiar with it)

wavedata = samples[9]

#Put a honey window
hanningWindow = np.hanning(len(wavedata))
wavedata = wavedata * hanningWindow

#Calculate the correction coefficient
acf = 1/(sum(hanningWindow)/len(wavedata))

#Fourier transform (converted to frequency signal)
F = np.fft.fft(wavedata)

#Normalization+Double the AC component
F = 2*(F/N)
F[0] = F[0]/2

#Amplitude spectrum
Adft = np.abs(F)

#Multiply the correction coefficient when multiplying the window function
Adft = acf * Adft

#Power spectrum
Pdft = Adft ** 2
#Logarithmic power spectrum
PdftLog = 10 * np.log10(Pdft)
# PdftLog = 10 * np.log(Pdft)

print(np.max(PdftLog))

start=0
stop=int(N/2)
plt.plot(freq[start:stop], PdftLog[start:stop])
plt.xlabel("frequency [Hz]")
plt.ylabel("Logarithmic Power Spectrum [dB]")

plt.show()

Output: -0.2237693

index5.png

Recommended Posts

Operate the Speech Signal Processing Toolkit via python
python3 Measure the processing speed.
Acoustic signal processing with Python (2)
Acoustic signal processing with Python
Leave the troublesome processing to Python
Signal processing in Python (1): Fourier transform
[Python] Automatically operate the browser with Selenium
The story of blackjack A processing (python)
Get the weather in Osaka via WebAPI (python)
The easiest way to synthesize speech with python
View the result of geometry processing in Python
Image processing? The story of starting Python for
Operate the schedule app using python from iphone
Install MongoDB on Ubuntu 16.04 and operate via python
Python file processing
[Python] Measures and displays the time required for processing
[Reintroduction to python] How to import via the parent directory
Use the CASA Toolkit in your own Python environment
Send data from Python to Processing via socket communication
Send and receive Gmail via the Gmail API using Python
Examine the close processing of Python dataset (SQLAlchemy wrapper)