Try to create a waveform (audio spectrum) that moves according to the sound with python

Introduction

Currently, there seems to be no free software on mac that can use the audio spectrum (waveform in the frequency domain that moves slimy according to common sounds). So, let's make our own using python and play with it.

(For windows, it seems that you can do it with free software called AviUtl.)

Situation and purpose

I have a ** wav format ** file that I want to create an audio waveform. (In my case, it's the output file of the song I typed in GarageBand.) I'd like to convert this to a video format, but it's a bit dull for a video that only has audio flowing in a still image.

Therefore, the purpose of this time is to create an ** audio spectrum ** that moves according to the song you made so that it will look more or less video.

You can make something like this ↓ https://www.youtube.com/watch?v=JPE54SlF6H0 [Pokemon Sword Shield] Battle! Beat [8bit sound source arrangement] AudioVissualizer.gif

1. About the environment

OS:macOS High Sierra 10.13.6 Language: Python 3.7.4

Other than the standard library,

-PyGame (It's a game engine, but it's just like a GUI for display) -PyAudio (used to play wav files) -PySoundFile (used to read wav file data) -** SciPy ** (Used for fast Fourier transform ← Is it faster than NumPy?)

Requires installation. Basically, I think pip (pip3) is OK. I will write it so that even those who only understand NumPy can read it (because I am).

2. Sample code

This is a sample code that only moves light blue waves on a pink background. If you prepare a sound source named sample.wav in the same layer as the program, you can move it for the time being. Monaural stereo is quite crude, but both are supported. [Free sound source like this](https://on-jin.com/sound/listshow.php?pagename=ta&title=%E3%82%B3%E3%83%B3%E3%83%88%E3%81 % AE% E3% 82% AA% E3% 83% 8102% EF% BC% 88% E3% 83% 81% E3% 83% A3% E3% 83% B3% E3% 83% 81% E3% 83% A3 % E3% 83% B3% EF% BC% 89 & janl =% E3% 81% 9D% E3% 81% AE% E4% BB% 96% E9% 9F% B3 & bunr =% E3% 83% 90% E3% 83% A9 % E3% 82% A8% E3% 83% 86% E3% 82% A3 & kate =% E3% 81% 9D% E3% 81% AE% E4% BB% 96) You can also play.

After posting the whole thing, I would like to take a closer look.

SampleAudioVisualizer.py


#!/usr/bin/env python3
import wave
import sys
import pygame
from pygame.locals import *
import scipy.fftpack as spfft
import soundfile as sf
import pyaudio
import numpy as np

# --------------------------------------------------------------------
#Parameters
# --------------------
fn = "sample.wav"
#for calculation
CHUNK = 1024  #Output to stream in chunks with pyaudio(I don't know why 1024)
start = 0  #Sampling start position
N = 1024  #Number of FFT samples
SHIFT = 1024  #Number of samples to shift the window function
hammingWindow = np.hamming(N)  #Window function

# --------------------
#For drawing
SCREEN_SIZE = (854, 480)  #Display size
rectangle_list = []

# --------------------
#pygame screen initial settings
pygame.init()
screen = pygame.display.set_mode(SCREEN_SIZE)
pygame.display.set_caption("Pygame Audio Visualizer")

# --------------------------------------------------------------------
#Redraw function redraw defined later while playing wav file()Function to call
def play_wav_file(filename):
    try:
        wf = wave.open(filename, "r")
    except FileNotFoundError:  #If the file does not exist
        print("[Error 404] No such file or directory: " + filename)
        return 0

    #Open stream
    p = pyaudio.PyAudio()
    stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                    channels=wf.getnchannels(),
                    rate=wf.getframerate(),
                    output=True)

    #Play audio
    data = wf.readframes(CHUNK)
    while data != '':
        stream.write(data)
        data = wf.readframes(CHUNK)
        redraw()
    stream.close()
    p.terminate()

# --------------------------------------------------------------------
#Repeat "Drawing with FFT".
def redraw():
    global start
    global screen
    global rectangle_list

    # --------------------
    #Calculate the amplitude spectrum by applying FFT to the block of the target sample point.
    windowedData = hammingWindow * x[start:start + N]  #Data block with window function
    X = spfft.fft(windowedData)  # FFT
    amplitudeSpectrum = [np.sqrt(c.real ** 2 + c.imag ** 2)
                         for c in X]  #Amplitude spectrum

    # --------------------
    #Drawing in Pygame

    screen.fill((240, 128, 128))  #Initialize with your favorite color
    rectangle_list.clear()  #Rectangle list initialization
    #Spectral drawing While executing and adjusting numerical values
    for i in range(86):
        rectangle_list.append(pygame.draw.line(screen, (102, 205, 170), (1+i * 10, 350 + amplitudeSpectrum[i] * 1),
                                               (1+i * 10, 350 - amplitudeSpectrum[i] * 1), 4))

    pygame.display.update(rectangle_list)  #Display update

    start += SHIFT  #Shift the range to apply the window function
    if start + N > nframes:
        sys.exit()

    for event in pygame.event.get():  #End processing
        if event.type == QUIT:
            sys.exit()
        if event.type == KEYDOWN:
            if event.key == K_ESCAPE:
                sys.exit()

# --------------------------------------------------------------------
if __name__ == "__main__":

    # --------------------
    #Get wav data
    data, fs = sf.read(fn)  #The shape of data is(Number of frames x number of channels)
    if data.ndim == 1:
        x = data  #If it is monaural, use it as it is
    if data.ndim == 2:
        x = data[:, 0]  #If it's stereo, I decided to focus on the L channel only.(For R, change 0 to 1.)

    nframes = x.size  #Get the number of frames(Used as an end condition when shifting the window function in FFT)

    # --------------------
    #Start playback and drawing
    play_wav_file(fn)
# --------------------------------------------------------------------

3. Implementation flow

The data part in wav format is time series data that holds sound information for each ** 1 / fs ** seconds (fs: sampling frequency [Hz]).

(Addition) [Free sound source](https://on-jin.com/sound/listshow.php?pagename=ta&title=%E3%82%B3%E3%83] % B3% E3% 83% 88% E3% 81% AE% E3% 82% AA% E3% 83% 8102% EF% BC% 88% E3% 83% 81% E3% 83% A3% E3% 83% B3 % E3% 83% 81% E3% 83% A3% E3% 83% B3% EF% BC% 89 & jarn =% E3% 81% 9D% E3% 81% AE% E4% BB% 96% E9% 9F% B3 & bunr = % E3% 83% 90% E3% 83% A9% E3% 82% A8% E3% 83% 86% E3% 82% A3 & kate =% E3% 81% 9D% E3% 81% AE% E4% BB% 96) Let's plot the data of. (Since this sound source is stereo, I will take only the L channel) Figure_1.png Like this, you can see that this data (array) contains waves that take values from -1 to +1. The horizontal axis is the index of the array. Since the information of 1 / fs seconds (by the way,fs = 44.1 [kHz]in this example) is expressed for each element, it is the" waveform seen on the time axis ". ..

It may be easier to say that you can convert to seconds by multiplying the horizontal axis of this graph by 1/44100.

The audio spectrum, on the other hand, is a constantly changing graph in the frequency domain. ** Data in the time domain can be viewed in the frequency domain by Fourier transform **, so it seems that we will proceed while using the Fourier transform well.

Therefore,

  1. Read ultra-short data ... (index 0 to 1023)
  2. While playing the audio using ** PyAudio ** ...
  3. Fast Fourier Transform (** FFT **) & Draw the transformed spectrum with ** PyGame ** ...
  4. Also read the next short data ... (indexes 1024 to 2047)
    • (Repeat this until the data is finished) *

It seems that it would be good to perform the processing. It is an image that frequently repeats audio reproduction and Fourier transform in real time.

By the way, I am trying to process wav data points by shifting 1024 by 1024 as "short-time data", but it does not have to be 1024 separately. However, if you make it too small, it will take longer to draw than to play it, so the behavior will be strange.

3-1. Get the time series data of wav file and its length (number of frames)

This part of the main routine.

Excerpt


import soundfile as sf
fn = "sample.wav"
# (Abbreviation)
# --------------------------------------------------------------------
if __name__ == "__main__":

    # --------------------
    #Get wav data
    data, fs = sf.read(fn)  #The shape of data is(Number of frames x number of channels)
    if data.ndim == 1:
        x = data  #If it is monaural, use it as it is
    if data.ndim == 2:
        x = data[:, 0]  #If it's stereo, I decided to focus on the L channel only.(For R, change 0 to 1.)

    nframes = x.size  #Get the number of frames(It is used as an end condition when shifting the window function in the FFT described later.)

    # --------------------
    # (Abbreviation)

You can use PySoundFile to handle wav files nicely. I was able to get the data and its length using the read () method. (Reference: Wav file operation in Python)

3-2. Wav file playback process

Define a function called play_wav_file () that writes to a stream and plays audio in units of CHUNK. The module uses wave and PyAudio.

(Reference: [Python] Play wav files with Pyaudio)

Basically, it's the same as the article I referred to, but I put a self-made function called redraw () in the loop process of writing to a stream and reading the next data. (To display the audio spectrum at the same time as playback)

Excerpt


import wave
import pyaudio

# --------------------------------------------------------------------
#Parameters
# --------------------
#for calculation
CHUNK = 1024  #Output to stream in chunks with pyaudio(I don't know why 1024)

# ~Omission~

# --------------------------------------------------------------------
#Redraw function redraw defined later while playing wav file()Function to call

def play_wav_file(filename):
    try:
        wf = wave.open(filename, "r")
    except FileNotFoundError:  #If the file does not exist
        print("[Error 404] No such file or directory: " + filename)
        return 0

    #Open stream
    p = pyaudio.PyAudio()
    stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                    channels=wf.getnchannels(),
                    rate=wf.getframerate(),
                    output=True)

    #Play audio
    data = wf.readframes(CHUNK)
    while data != '':
        stream.write(data)
        data = wf.readframes(CHUNK)
        redraw() #It is a function for redrawing. I will make it later.
    stream.close()
    p.terminate()

# ~Omission~

3-3. Apply FFT to the block of target sample points

This article (Short-Time Fourier Transform-A Breakthrough on Artificial Intelligence) is very easy to understand and was helpful.

Since we set CHUNK = 1024 at the time of audio playback earlier, we also set the number of target samples N to which the fast Fourier transform (hereinafter referred to as FFT) is applied to 1024.

After extracting 1024 data from the whole data, do not perform FFT as it is, but apply ** window function ** and then perform FFT. It has become a theoretical story, but the page called Reason for using window functions-Logical Arts Institute, which was introduced in the previous article. It is organized in an easy-to-understand manner, so please have a look if you are interested.

Here we use the major ** humming window ** (np.hamming ()). By applying this, the edges are connected smoothly, and the cut out sample becomes a periodic function. 1024px-Window_function_(hamming).svg.png

Excerpt


import sys
import scipy.fftpack as spfft
import numpy as np

# --------------------------------------------------------------------
#Parameters
# --------------------
#for calculation
CHUNK = 1024  #Output to stream in chunks with pyaudio(I don't know why 1024)
start = 0  #Sampling start position
N = 1024  #Number of FFT samples
SHIFT = 1024  #Number of samples to shift the window function
hammingWindow = np.hamming(N)  #Window function

# ~Omission~

# --------------------------------------------------------------------
#Repeat "Drawing with FFT". Here, we will only look at the process of applying FFT.
def redraw():
    global start
    # ~Omission~

    # --------------------
    #Calculate the amplitude spectrum by applying FFT to the block of the target sample point.
    windowedData = hammingWindow * x[start:start + N]  #Data block with window function
    # (↑ list x[]Is this article 3-It is the wav data extracted in 1.)
    X = spfft.fft(windowedData)  # FFT
    amplitudeSpectrum = [np.sqrt(c.real ** 2 + c.imag ** 2)
                         for c in X]  #Amplitude spectrum

    # --------------------
    #Drawing process in Pygame here(Omitted here)

    start += SHIFT  #Shift the range to apply the window function
    if start + N > nframes:
        sys.exit() #Go to the end of the wav file and exit when the window function can no longer be applied

    #Here are the end conditions for PyGame(Omitted here)

# --------------------------------------------------------------------
# ~Omission~

What you are doing is simple, sample N data, apply a window function to perform FFT, calculate the amplitude spectrum, shift the sampling target by SHIFT, and prepare for the next call. I will. All you have to do now is draw the calculated amplitude spectrum using ** PyGame **.

3-4. Drawing using PyGame

I will play with this article (Visualizer for beginners in Python).

Excerpt


import pygame
from pygame.locals import *
# --------------------------------------------------------------------
#Parameters
# --------------------
# ~Omission~
# --------------------
#For drawing
SCREEN_SIZE = (854, 480)  #Display size
rectangle_list = []

# --------------------
#pygame screen initial settings
pygame.init()
screen = pygame.display.set_mode(SCREEN_SIZE)
pygame.display.set_caption("Pygame Audio Visualizer")
# --------------------------------------------------------------------
#Repeat "Drawing with FFT".
def redraw():
    # ~Omission~
    global screen
    global rectangle_list

    # --------------------
    #Amplitude spectrum by applying FFT to the block of target sample points(amplitudeSpectrum)Processing to calculate(abridgement)
    # --------------------
    #Drawing in Pygame

    screen.fill((240, 128, 128))  #Initialize with your favorite color
    rectangle_list.clear()  #Rectangle list initialization
    #Spectral drawing While executing and adjusting numerical values
    for i in range(86):
        rectangle_list.append(pygame.draw.line(screen, (102, 205, 170), (1+i * 10, 350 + amplitudeSpectrum[i] * 1),
                                               (1+i * 10, 350 - amplitudeSpectrum[i] * 1), 4))

    pygame.display.update(rectangle_list)  #Display update

    # ~Omission~

    for event in pygame.event.get():  #End processing
        if event.type == QUIT:
            sys.exit()
        if event.type == KEYDOWN:
            if event.key == K_ESCAPE:
                sys.exit()
# --------------------------------------------------------------------
# ~Omission~

The problem is how to display the waves, but if you use pygame.draw.line, for example, it seems that you can express the waves with multiple straight lines in the same way as a histogram. I think that arrangements will work as much as you like around here. PyGame's methods are organized here [http://westplain.sakuraweb.com/translate/pygame/Display.cgi). It seems that pygame.draw.line is used like this.

pygame.draw.line Draw a straight line segment.

pygame.draw.line(Surface, color, start_pos, end_pos, width=1): return Rect Draw a straight line segment on your Surface. There is no special decoration at both ends of the line, and it becomes a square shape that matches the thickness of the line.

As an example of the drawing flow, determine the size of the PyGame window in advance, initialize it, and then

  1. Decide the background color and initialize the PyGame screen (colors can be seen on such sites / WEB color sample list)
  2. Create a straight line object (pygame.Rect) based on the calculated amplitude spectrum and keep it in the list.
  3. Screen update (= drawing) with pygame.display.update ()

Is it like that? Let's also prepare the termination process when the PyGame window is erased with the × button or the esc key is pressed.

(By the way, the display size is set to 854 * 480 to match the aspect ratio of youtube, and the for loop range is set to 86 when the interval of the quadrangle (straight line) representing the wave created this time is 87. It's because it goes off the screen after the eyes. The description here is not very smart ... I'm sorry. If you play while changing the numbers appropriately, I think that you can grasp the behavior somehow. )

4. Other

In the sample code, only the waves are moving in the background color, but you can also put images of characters and logos like the opening gif. (Reference: Introduction to Pygame with Python 3: Chapter 1) It's easy if you do Surface.blit () in redraw () I think it can be implemented.

Also, this time I worked hard so far and made a video by recording the screen that was created, but it seems that some people are doing things like writing the screen of PyGame to a video. [PyGame] AVI export & screenshot of screen

5. Impressions

I've only touched Python in a university class to the extent of playing with sample code, but it's interesting because there are various useful libraries. There may have been many places where I didn't know how to do it, but I hope I can study it little by little.

Thank you for your hard work!

Recommended Posts

Try to create a waveform (audio spectrum) that moves according to the sound with python
Probably the easiest way to create a pdf with Python3
Try to create a python environment with Visual Studio Code & WSL
[Cloudian # 3] Try to create a new object storage bucket with Python (boto3)
Try to solve the man-machine chart with Python
Create a page that loads infinitely with python
Try to make a "cryptanalysis" cipher with Python
Steps to create a Twitter bot with python
Try to make a dihedral group with Python
Try to solve the traveling salesman problem with a genetic algorithm (Python code)
Try to make a command standby tool with python
How to create a submenu with the [Blender] plugin
Try to solve the internship assignment problem with Python
A memo that I touched the Datastore with python
[Python] How to create a 2D histogram with Matplotlib
Create a directory with python
Try to play with the uprobe that supports Systemtap directly
Create a Mastodon bot with a function to automatically reply with Python
Let's create a script that registers with Ideone.com in Python.
Try to bring up a subwindow with PyQt5 and Python
Create a Twitter BOT with the GoogleAppEngine SDK for Python
Try to automate the operation of network devices with Python
[Python] The first step to making a game with Pyxel
Create a message corresponding to localization with python translation string
Try to decipher the garbled attachment file name with Python
A script that makes it easy to create rich menus with the LINE Messaging API
Let's create a program that automatically registers ID/PW from CSV to Bitwarden with Python + Selenium
Try to visualize the nutrients of corn flakes that M-1 champion Milkboy said with Python
Try to generate a death metal jacket image with DCGAN + scrape the metal database site for that
Play a sound in Python assuming that the keyboard is a piano keyboard
I want to use a wildcard that I want to shell with Python remove
[Python] Create a program to delete line breaks in the clipboard + Register as a shortcut with windows
Create folders from '01' to '12' with python
Try to operate Facebook with Python
A story that struggled to handle the Python package of PocketSphinx
Create a shell script to run the python file multiple times
[Python] A memo that I tried to get started with asyncio
Create a color picker for the color wheel with Python + Qt (PySide)
Create a virtual environment with Python!
Try to extract a character string from an image with Python3
How to deal with the problem that the current directory moves when Python is executed from Atom
Rails users try to create a simple blog engine with Django
I tried to create a list of prime numbers with python
I made something with python that NOW LOADING moves from left to right on the terminal
Create REST API that returns the current time with Python3 + Falcon
How to create a heatmap with an arbitrary domain in Python
[Ev3dev] Create a program that captures the LCD (screen) using python
[LINE Messaging API] Create a BOT that connects with someone with Python
I wanted to solve the ABC164 A ~ D problem with Python
A story that didn't work when I tried to log in with the Python requests module
A script that returns 0, 1 attached to the first Python prime number
Create a poster with matplotlib to visualize multiplication tables that remember multiplication
5 Ways to Create a Python Chatbot
Try to output audio with M5STACK
Try to solve the shortest path with Python + NetworkX + social data
Try adding a wall to your IFC file with IfcOpenShell python
How to send a request to the DMM (FANZA) API with python
[python] A note that started to understand the behavior of matplotlib.pyplot
Try to create a Qiita article with REST API [Environmental preparation]
The story of making a module that skips mail with python
Create a REST API to operate dynamodb with the Django REST Framework