[PYTHON] Fourier transform of raw data

I tried to analyze the discrete Fourier transform with raw data

I tried to find the features required for machine learning by Fourier transforming from time series data. There were some articles that automatically created and analyzed the periodic function, but I took time to figure out how to deal with the actual data, so I created it as a memorandum. I also summarized it, including understanding the concepts and terms. It may be ambiguous because it is a rough understanding. Please point out any mistakes.

Discrete Fourier Transform (DFT)

To briefly summarize the Fourier transform, __ A conversion of the periodic signal of time series data for the time domain into a spectrum in the frequency domain __ become. Also, by applying a window function (generally a humming window), it is forcibly regarded as periodic.

Fast Fourier Transform (FFT)

The Fast Fourier Transform refers to a discrete Fourier transform that can be calculated at high speed when the number of data is a power of 2. I don't understand the details of the formulas and algorithms, but if you have a huge amount of data, you can expect an improvement in calculation speed by padding to 0 and aligning the number of data to the power of 2.

Implementation

First, the raw data is Fourier transformed as it is. Here, Numpy fft is used. (Numpy version is '1.18.1') https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html FFT is a fast Fourier transform, but when you pass a value to numpy, it is Fourier transformed even if it is not a power of 2. In other words, it is considered that FFT is executed when it is a power of 2, and DFT is executed when it is not a power of 2. This data is 30fps 1 second data That is, the sample data 30 is Fourier transformed. This time, one column and 30 rows of a certain csv file is treated as data.

Data retrieval


#Import of required libraries
import numpy as np
#Data loading
data = np.loadtxt('____.csv', delimiter=',', dtype=float)
features = data[0:30,0:1]
features[0:30,0:1]

The 30 data you want to analyze have been imported into features. By the way, the data in features is the following array data.

features


array([[0.32640783],[0.32772677],[0.32963271],[0.32872528],[0.33125733],[0.3250282 ],[0.33900562],[0.33105499],[0.33294834],[0.34554142],[0.33217829],[0.33006385],[0.33765947],[0.33173415],[0.33826796][0.34325231],[0.35284764],[0.34785128],[0.349527  ],[0.34782048],[0.35720176],[0.36520328],[0.37276383],[0.37766436],[0.37410199],[0.37990772],[0.38644416],[0.38045958],[0.37864619],[0.39122537]])

I proceeded with the implementation while referring to the related articles. The implementation code and result are as follows.

main


#Sampling frequency
rate = 30 # [1/s]

#Sample time[s]
t = np.arange(0, 1, 1/rate)

#signal
signal = np.ravel(features[0:30,0:1])
plt.plot(t, signal)
plt.xlim(0, 1)
plt.xlabel('time [s]', fontsize=20)
plt.title('Time series of signal', fontsize=20)
plt.show()
#print(np.ravel(signal))
# power spectrum
p = np.abs(np.fft.rfft(signal))

hammingWindow = np.hamming(30)    #Humming window
plt.plot(t, hammingWindow*signal)
plt.show()
p2 = np.abs(np.fft.rfft(hammingWindow*signal))

#Given the sampling rate (the reciprocal of), returns the frequency of each component
f = np.fft.rfftfreq(signal.size, d=1./rate)
print(p)
print(np.fft.rfft(hammingWindow*signal),p2)

plt.xlabel("Frequency [Hz]", fontsize=20)
plt.ylabel("Amplitude spectrum", fontsize=20)
plt.yscale("log")
plt.plot(f, p)
plt.show()
plt.yscale("log")
plt.plot(f, p2)
plt.show()
#Fast Fourier transform
F = np.fft.fft(hammingWindow*signal)
#Calculate amplitude spectrum
Amp = np.abs(F)
freq = np.linspace(0, 1.0/0.03, 30) #Frequency axis
print(np.fft.rfft(signal))
plt.plot(freq, Amp)
plt.show()

result

First, the data (features) in the time domain were graphed.

image.png

If you look at it, you can see that there is no periodicity ... Since the purpose is to implement it, we will proceed as it is. In addition, a shape adapted to a humming window that gives periodicity is like this.

image.png

It looks like a cycle. (Because the edges are aligned)

The result of Fourier transforming each of them into an amplitude spectrum is as follows.

image.png

[10.49214916 0.35316679 0.16715576 0.10588365 0.04224876 0.08174254 0.0362962 0.03914191 0.04407553 0.07351728 0.02862219 0.05424663 0.06239112 0.03642959 0.04382539 0.01436891]

image.png

[5.44620123 2.3808305 0.0238307 0.02004583 0.03723908 0.02777476 0.00710696 0.01448964 0.00951674 0.02489049 0.01539931 0.01742193 0.01562835 0.01323487 0.01355436 0.01185119]

The left axis is a logarithmic graph. A similar shaped amplitude spectrum was obtained.

As described above, the Fourier transform was performed to obtain the amplitude spectrum. The power spectrum is the square of the amplitude spectrum, so if you want the power spectrum, you can square the amplitude spectrum.

Summary

Below, the terms of the Fourier transform and numpy were complicated when determining the amplitude spectrum, so I will summarize them.

np.fft.fft: Returns the Fourier transform result of the real part and the imaginary part np.fft.rfft: Returns only the real part (1/2 array of np.fft.fft)

Therefore, abs (np.fft.fft) and abs (np.fft.rfft) have the same size but different return lengths. np.fft.rfft is half of np.fft.fft. It is not necessary because it is a complex conjugate.

Fourier transform and python

F (w) = np.fft.fft (w): Fourier spectrum (Fourier transform)

|F(w)| = np.abs(F(w)): Amplitude spectrum(F(w)Absolute value of)

|F(w)|^2: Power spectrum[F(w)*(F(w)Conjugate complex number of)], Amplitude spectrum squared, power(power)To have the dimension of

Fourier analysis of Excel

In fact, Fourier analysis is installed in Excel's data analysis tool. When I verified whether it was the same as numpy, the same real part and imaginary part as np.fft.fft were returned. If you have trouble programming, that may be better. However, the values are a mixture of the real part and the imaginary part, and a new calculation is required to obtain the amplitude spectrum. Also, note that in Excel, the number of data cannot be executed unless it is a power of 2.

Inverse Fourier transform

It can be considered that the low frequency component forms a large outline in the graph shape, and the high frequency component is noisy. It was seen during a literature search. Therefore, if only the spectrum of the low frequency component 0Hz to 2Hz as a result of Fourier transform is inverse Fourier transformed, I thought that I could draw a graph close to the original data, so I implemented it. The first three data of what was rffted earlier are left as they are, and the others are set to 0.

irfft


plt.plot(t, np.fft.irfft([1.04921492e+01+0.j,8.98589353e-02+0.34154378j,-6.04938542e-02+0.15582535j,0,0,0,0,0,0,0,0,0,0,0,0,0]))
plt.show()

result image.png

It has a similar shape when compared with the graph of the original data. Therefore, it can be said that it was possible to reproduce (feature extraction was possible with frequency components).

References

https://ja.wikipedia.org/wiki/離散フーリエ変換 https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html https://kyotogeopython.zawawahoge.com/html/基礎編/Numpyの基礎(7)便利なライブラリ群.html https://aidiary.hatenablog.com/entry/20110716/1310824587 https://www.geidai.ac.jp/~marui/r_program/spectrum.html

Recommended Posts

Fourier transform of raw data
Properties of the discrete Fourier transform
Numerical summary of data
Qiskit: Quantum Fourier Transform
Preprocessing of prefecture data
Selection of measurement data
Tuning experiment of Tensorflow data
Visualization of data by prefecture
Average estimation of capped data
About data management of anvil-app-server
Probability prediction of imbalanced data
Let's compare the Fourier transform of the synthesized sound source and the composition of the Fourier transform.
Signal processing in Python (1): Fourier transform
Memory-saving matrix conversion of log data
Differentiation of time series data (discrete)
10 selections of data extraction by pandas.DataFrame.query
Animation of geographic data by geopandas
Recommendation of data analysis using MessagePack
Time series analysis 3 Preprocessing of time series data
Data handling 2 Analysis of various data formats
Regular export of Google Analytics raw data to BigQuery using cloud functions