Since the number of voice data samples was small due to machine learning, we mass-produced voice data by adding noise data. I will introduce the procedure at that time.

Basically, we will proceed with the discussion on the premise that the audio file to be handled is a wav file, so please forgive me ...

SoX（Sound eXchange) You can use this to process various sounds. This time, I will try mass-producing noise-containing voice data using some of these functions.

First, install it.

brew install sox --with-lame

Mix noise with audio data

Immediately, let's mix the noise. Let's pick up noise data from free materials. If you look at "free material voice", you will find data that can be used as noise such as daily life sounds.

sox -m sound.wav -v 0.1 noise.wav noise_mix.wav trim 0 3

You can create noise_mix.wav by mixing sound.wav and noise.wav with the above command.

-m means to mix the two audio files. -v is the loudness adjustment of noise (second audio data), 1 is the original volume. trim is trimming, this time trimming from 0th second to 3rd second.

Mass production by hitting the shell with python

When it comes to mass production, I think that it will be executed using for statements. In python, you can execute commands by using the subprocess module, so I thought it would be good to turn the for statement in this, so I wrote the code. I think that the code will change depending on the directory structure etc., so it is just an example, but it looks like the following.

`sox.py`


import subprocess

for sound_idx in range(1,11):
    for volume in range(1,11):
        for noise_idx in range(1,11):
            s_idx = str(sound_idx)
            noise_volume = str(float(volume) / 10)
            v = str(volume)
            n = str(noise_idx)
            cmd = f'sox -m sound_{s_idx}.wav -v{noise_volume} noise_{n}.wav {s_idx}_{v}_{n}.wav trim 0 3'
            subprocess.check_output(cmd, shell=True)

I should be able to write it a little more beautifully, so for reference only ...

Caution

If the data to be mixed and the number of sample rates are different, an error will occur, so I will not write this time, but I think that there is a possibility that adjustment will be necessary using SoX etc. in this area as well.

Reference: http://webdatareport.hatenablog.com/entry/2016/11/06/161304

[PYTHON] Create noise-filled audio data with SoX

Mix noise with audio data

Mass production by hitting the shell with python

sox.py

Caution

`sox.py`