[PYTHON] Try audio signal processing with librosa-Beginner

Introduction

The author has no knowledge of voice signal processing and voice recognition. This article is not recommended for professionals on the road (; ´ ・ ω ・) By the way, I plan to proceed to beginner, intermediate, and advanced.

Motivation

At work, a story like "Recommend music!" Came out.

Is music recommendation classified as voice recognition?

The answer is no. Speech recognition is the process by which a machine converts the voice spoken by a human into characters, so music recommendation is not called speech recognition. (This site was very easy to understand.) Music recommendation seems to be a research field called MIR, and audio signal processing seems to be the core.

What is MIR

Abbreviation for MusicInformatioRetrieval. Text data is used as input for music search by artist name or song name, which is usually used, but MIR uses the voice waveform itself as input.

Below is a specific example of MIR

--Recommend music that suits the listener --Instrument separation and instrument recognition --Automatic transcription (doesn't you need ear copy?) --Automatic classification (genre labeling, etc.) --Music generation, etc ...

Convenient tools and libraries for audio signal processing

I touched the above three, but for me, a beginner of audio signal processing, librosa was better than SPTK. (SPTK was troublesome to build the environment ...) Also, it is recommended for people who want to study audio signal processing while studying machine learning with Python. (Although it is possible to write SPTK from Python, of course)

That's why the introduction has become long, but this time I will introduce librosa.

(By the way, the article about building a similar music system using SPTK was too excellent .. http://aidiary.hatenablog.com/entry/20121014/1350211413)

installation of librosa

I was quite impatient because the "jupyter notebook" did not pass during the environment construction, so I will summarize the procedure.

procedure

    1. Reinstall Anaconda (probably not needed on Mac or Linux, I think it's not needed on Windows if the latest versions of Anaconda and Python)
  1. DL of resampy

    1. DL of librosa
  2. Installation of Microsoft Visual C ++ Compiler for Python 2.7

  3. Open Visual C ++ 2008 64-bit Command Prompt and execute the following command in each directory of reampy and librosa

    python setup.py build python setup.py install

In python

 library(librosa)

If it passes, it's ok

Old environment: Python2.7.11: Anaconda2-4.0.7 New environment: Python2.7.12: Anaconda2-4.2.0

Before touching librosa

I will summarize what I investigated when starting audio signal processing

--Three elements of sound --Loudness: Corresponds to the amplitude of the wave. The louder the sound, the larger the amplitude. --Pitch: Equivalent to wave frequency and period. The higher the sound, the higher the frequency and the shorter the cycle. --Tone: Corresponds to the shape of a wave.

図1.gif 図2.gif

--Sampling frequency (unit: Hz) --Frequency of taking samples per unit time --The sampling frequency used for music CDs is 44.1kHz --Number of frames (≈ data volume) --Number of channels: The number of sound information when different data are output at the same time. 1 for monaural, 2 for stereo. --Quantization bit number ――How many bits do you want to convert analog data to digital data at a time? ――The larger the number, the larger the amount of data --It seems that 16bit or more is often used for audio, 8bit for telephone voice, and 8-10bit for video signals.

Finally the main subject

librosa is a Python package for music analysis. Modules for MIR are provided.

What I did while referring to the librosa tutorial

--Visualize the waveform --Note: I tried it with librosa, but finally I am using the Python standard library wave. .. --Beat tracker --Audio playback --Split the original voice into percussion instruments / treble / chords

from now on

--Collect "learning data (music) that is as unbiased as possible". --Reference URL: https://kodack64.gitbooks.io/toho_mir_ml/content/1-0.html --Study a little more about voice analysis (Fourier transform, window transform, pre-emphasis filter, etc.) --Intermediate plan: Acquire knowledge about music features and extraction methods --Chord progression, HVL, BPM, MBL, MSL, ASL, mfcc, local features (so-called rust), etc ... --Schedule for advanced edition: Find the best feature for searching for similar songs ――Let's learn by combining features --Construction and evaluation of similar music system. (We also have to think about the evaluation method.)

Impressions

――I tried to dig into the world of audio signal processing with the intention of using a weapon called machine learning, but I will study more because I do not have enough knowledge. ――Personally, it turned out that the motivation for studying was considerably increased when the input data of machine learning was converted to voice. Actually, it was the biggest discovery this time.

Reference URL summary

Thank you very much. Please look forward to it next time!

Recommended Posts

Try audio signal processing with librosa-Beginner
Acoustic signal processing with Python (2)
Acoustic signal processing with Python
Try to output audio with M5STACK
Try scraping with Python.
Image processing with MyHDL
Processing datasets with pandas (1)
Processing datasets with pandas (2)
Try SNN with BindsNET
Image processing with Python
Parallel processing with multiprocessing
Try regression with TensorFlow
Image Processing with PIL
Acoustic signal processing starting with Python-Let's make a stereophonic system
Image processing with Python (Part 2)
100 Language Processing with Python Knock 2015
Try to factorial with recursion
Try function optimization with Optuna
Embed audio data with Jupyter
Try deep learning with TensorFlow
Parallel processing with local functions
Image processing with PIL (Pillow)
Try using PythonTex with Texpad.
"Apple processing" with OpenCV3 + Python3
Try edge detection with OpenCV
Try implementing RBM with chainer.
Try Google Mock with C
Try using matplotlib with PyCharm
Try GUI programming with Hy
Try an autoencoder with Pytorch
Try Python output with Haxe 3.2
Try matrix operation with NumPy
Try implementing XOR with PyTorch
Try running CNN with ChainerRL
Try various things with PhantomJS
Try Deep Learning with FPGA
Parallel processing with Parallel of scikit-learn
Image processing with Python (Part 1)
Image processing with Python (Part 3)
Try running Python with Try Jupyter
Try implementing perfume with Go
Try Selenium Grid with Docker
Try face recognition with Python
Try OpenCV with Google Colaboratory
Try machine learning with Kaggle
Try TensorFlow MNIST with RNN
Try building JupyterHub with Docker
Data processing tips with Pandas
Try using folium with anaconda
[Python] Image processing with scikit-image