Understand the digital representation of sound

This time, we will look at how the vibration of goods, which is a very analog thing, is expressed in the world of digital signals.

What does 16bit / 44.1kHz mean?

What does 16bit / 44.1kHz, which is the digital standard for CDs mentioned in the previous article, mean?

This is, in conclusion, the format of the CD,

Bit depth (quantization bit number): 16bit
Sampling rate: 44.1KHz

It can be used as an index to show the sound quality as it is.

If you're a developer, you may be familiar with bit depth. This shows the number of bits per fixed unit, and as a familiar thing, it shows the number of colors per dot in images of personal computers.

Bit depth in the world of digital sound indicates how finely the points indicated by the waveform can be decomposed and expressed in one unit on the time axis. And the sampling rate shows how finely a certain point on the time axis can be decomposed and expressed.

That is a tentative definition, but I think that the above explanation alone does not come to a pinch, so I would like to explain it visually using a sine wave as a simpler sample.

Sine wave waveform

What is a sine wave?

First, I will briefly explain what is called a "sine wave", which is treated as a sample explanation.

A sine wave, also known as a sine wave, has the simplest waveform to represent sound.

The figure above is a typical sine wave waveform. The horizontal axis is the time axis, and the vertical axis is the amplitude, which will be described later. As you know, the pitch can be expressed by frequency, but in this figure, the steeper the angle of the graph, the higher the frequency, and the gentler the angle, the lower the frequency.

This sine wave is the simplest waveform and can be said to be the purest representation of sound.

A little derailed here, but in nature, these pure sine wave sounds don't really exist. Imagine, for example, the sound of a piano. As many of you may know, when you play the "do" sound on the piano, the scale of the "do" is actually called the "fundamental tone", but in addition to that, it is called the "overtone". Includes sounds that are integral multiples of the fundamental frequency. For example, 3rd overtone (so), 5th overtone (mi), etc. The sound of the piano and the sound of the organ are completely different even with the same "do". It depends on how this overtone is included. In other words, it is the overtones that determine the character of the sound.

A sine wave is a strictly pure pitch that does not contain any of the above "overtones". By the way, the feeling of sound is similar to the feeling of "two-to-two" when the phone hangs up. It may not exist in nature and may not be familiar to your ears, but since simplicity is a good sample, we will use this sine wave as an example.

Expression for sine wave

As some of you may have learned as a "sine curve" in mathematics, the above sine wave curve is shown by the following formula.

Where t is the time, A is the amplitude (maximum deviation from the center of the wave), ω is the angular frequency, and −φ is the initial phase (phase at t = 0). Of these, the initial phase has nothing to do with the future, so we will always fix it to 0, and from now on, the sine wave curve will be expressed by the following formula.

y =amplitude*sin(frequency*Times of Day)

With this formula, you can draw a curve of a sine wave, but if the sine wave actually exists, an infinite number of points are required to express the waveform showing the vibration as it is, and a digital graph. It cannot be expressed by, but it will be expressed by approximation. This approximation is closer to the actual curve as the number of samples at the points forming the curve increases and the finer it becomes. The number and fineness of this number of samples is the sampling rate and bit depth that are used as indicators of sound expression in the digital world.

Sampling rate and bit depth

In the previous section, we explained that the fidelity for digitally representing a curve that represents a waveform is indicated by the sampling rate and bit depth.

For the sake of clarity, we will continue to use sine waves as an example. Look at the curve below.

It says "sine wave" dignifiedly, but what is the sine wave in this rattling? It's a tragic thing that makes you want to make your voice loud. Or rather, it doesn't even have a curve.

Why did you become so sad?

Actually, this sets the decomposition accuracy of the horizontal axis, which is the time axis, to 10. Here, assume that the horizontal axis represents 1 second in total. Then, it can be regarded as a curve drawn by taking a sample with a sine wave formula every 1/10 second.

So how do you make this a more decent curve? Of course, you just have to improve the sample accuracy.

1/20 seconds

The sample stations are marked for clarity. You can see that it is much closer to the curve than the one in the previous 1/10 second.

1/100 second

If you set the sample accuracy to 1 in 1/100 second, you can draw a curve that you want to express considerably. Sampling 100 times per second in this way can be expressed as ** sampling rate 100Hz **.

** In other words, a CD sampling rate of 44.1KHz means that sampling is performed with an accuracy of 44100 times per second. ** **

The accuracy of the horizontal axis (time axis) has been seen as an example so far, but the same can be said for the vertical axis (amplitude). The reproducibility of the curve is determined by how finely the amplitude of the vertical axis can be expressed at the observation point of a certain sampling period. This is easy for developers to understand.

** A CD bit depth of 16 bits means that the amplitude can be expressed with a resolution of 2 to the 16th power of the sampling period, that is, 65,536. ** **

For the next time

So far, we have seen how to represent waveforms in the world of digital signals.

When expressing a sine wave with the sound quality of a CD, it was found that a curve should be drawn 44100 times per second with a value of 16-bit precision. Perhaps programmers are already in the spotlight.

** That means that the sound quality of a CD per second can be expressed by a 16-bit float type array with 44100 elements! !! ** **

Well, it's coming up! Next time, I will actually write the sine wave programmatically.

bonus

All the graphs in this article are created using the following Python library. Easy and convenient.

numpy (numerical calculation library)
matplotlib (graph drawing library)

[PYTHON] Making sound by programming Part 2