[PYTHON] CNN Acceleration Series ~ FCNN: Introduction of Fourier Convolutional Neural Network ~

Overview

With the advent of convolutional neural networks (CNNs), the field of image recognition has made great strides. Starting with AlexNet, various networks such as VGGNet and GoogLeNet have been proposed and have shown their effects. Although it is such a CNN, the drawback is that it takes a lot of time because the convolution operation is heavy. Its weight is such that the fully connected layer can be completely ignored, and if the convolution process can be accelerated, the learning time can be significantly reduced. Therefore, what is attracting attention is the relationship ** convolution in the spatial region = element product in the frequency space **. In this article, we will explain the basic theory of convolution in the frequency domain. I couldn't find any articles in Japanese, so I'm trying my best to read them from the paper. Please point out that there may be some mistakes and misunderstandings.

The papers covered in this article are

Fast Training of Convolutional Networks through FFTs; Michael Mathieu, Mikael Henaff, Yann LeCun; arXiv:1312.5851v5 [cs.CV] 6 Mar 2014

is.

-[What is convolution](#What is convolution) -[Linear Convolution](#Linear Convolution) -[Circular convolution](#circular convolution) -[How to equalize linear convolution and circular convolution](#How to equalize linear convolution and circular convolution) -[Convolution with CNN](Convolution with #cnn) -[To frequency domain](#To frequency domain) -[2.2 Algorithm](# 22-Algorithm) -[2.3 Implementation and Spatial Complexity](# 23-Implementation and Spatial Complexity) -[3 Experiment](# 3-Experiment) -[4 About the future](# 4-About the future) -[Try to implement?](# Try to implement) -Appendix -[Strict Convolution and CNN](# Strict Convolution and cnn) -[Cooley-Tukey FFT algorithm](# cooley-turkey-fft algorithm)

What is convolution?

In order to correctly understand the convolution in the frequency domain, let us first reconsider the convolution in the spatial domain. There are two types of convolution processing, called ** linear convolution (straight convolution) ** and ** circular convolution (circular convolution) **, respectively. First, I will explain these two.

I will introduce the variables that will be used hereafter first.

-$ I \ cdots $ Input size -$ F \ cdots $ Filter size -$ O \ cdots $ Output size

Linear convolution

Linear convolution is the convolution process shown in the figure below. In this way, it feels like starting from the point where the right edge of the filter and the left edge of the input overlap, and ending when it passes through. I won't put a mathematical formula because it's unnecessary, but please understand that it works like this. And note that this linear convolution ** the input is not a periodic function **.

In addition, the following relational expression holds for size.

O = I + F - 1

In GIF, $ I = 9 $ and $ F = 3 $, so the output size is $ O = 9 + 3-1 = 11 $.

Circular convolution

Circular convolution is the convolution process shown in the figure below. It's like starting from the point where the left edge of the filter and the left edge of the input overlap, and ending when the right edge of the filter and the right edge of the input overlap. This is exactly the difference from linear convolution, in which case the ** input must be a periodic function **. I think it's okay to understand that the input is not necessary because the input is a periodic function for each of the two ends represented by the linear convolution GIF (I think that it is strictly different, but the strictness is not particularly required. ).

In addition, the following relational expression holds for size.

O = I - F + 1

In GIF, $ I = 9 $ and $ F = 3 $, so the output size is $ O = 9 --3 + 1 = 7 $.

How to equalize linear and circular convolutions

For linear and circular convolutions to be equal, both ends of the input must be zero padded so that $ O = I + F -1 $. You can see that the output is equal to the linear convolution.

Convolution on CNN

Now, which is the convolution operation on CNN? Think from an input point of view.

The data that is input to the CNN is usually an image, which is most likely a photo. This means that it can be expected that the image will not have periodicity unless the image is recognized as a pattern. This means that the input is aperiodic, so ** CNN needs to do a linear convolution **.

But it's still too early to conclude this. Please see the figure below. This shows how CNN is folded. As you can see, the filter does not pass through the input image, so it is ** circular convolution in this figure **. This means that ** the information cannot be obtained correctly ** from the convolution operation. However, assuming that ** the input images are periodically arranged vertically and horizontally (overlapping $ F -1 $) **, this convolution can be considered correct.

I'd like to say, "Which one is it after all?", But I still have something to think about. It's ** padding **, an essential technology for CNN. There aren't enough in this figure, but ** the circular convolution can be equal to the linear convolution if you pad the required number **. Then, if the input image is correctly padded, linear convolution can be performed, and ** information can be obtained correctly from the convolution operation **. In Article I wrote before, the role of padding is "maintaining size" and "getting all the information on the edge". That's it. (At the time of writing, I didn't think about that as much as dew, but lol)

Now, this process of padding makes the discussion even more complicated. Even more considering the stride peculiar to convolution on CNN ... This padding is often used ** to prevent (or reduce) the size attenuation of the output image due to convolution in the CNN. Therefore, the padding width increases or decreases depending on the stride, so we do not want to make the circular convolution a linear convolution at all. Then, the padding width will be excessive or insufficient. There is no problem if the padding width is large, but if it is small, the conversion to circular convolution $ \ rightarrow $ linear convolution will not be performed correctly.

After all, the conclusion is ** "Convolution on CNN is between linear / circular convolution" **. It seems like "I'm pulling this much and the conclusion is ambiguous", but it can't be helped because I can only say so. When it comes to theoretically rigorous discussions, it's actually different from the premise.

This is the premise. From here, we will finally move on to the theory in the frequency domain. By the way, if you are wondering "what is the mathematical rigor?", See [Appendix](#Strict convolution and cnn).

To the frequency domain

By the way, the story so far is a summary of the results of my research here and there, which I felt was necessary while reading the dissertation. It's still sweet, but I think it's okay if you have this much in mind. Now it's time to get into the content of the dissertation. The Backpropagation in Paper 2.1 only describes the part that compares the CNN in the spatial domain with the CNN in the frequency domain, so I will cut it and read from 2.2. By the way, the dissertation is not translated into Japanese but translated and added. The figures in this section are quoted from the paper.

2.2 Algorithm

It is well known that the convolution operation in the spatial domain is equivalent to the element product in the Fourier (frequency) domain **. Therefore, the convolution operation in the spatial region uses the Fourier transform $ \ mathcal {F} $ and its inverse operation $ \ mathcal {F} ^ {-1} $.

f * g = \mathcal{F}^{-1}\big(\mathcal{F}_f(k) \odot \mathcal{F}_g(k)\big)

Can be expressed as. However, the symbol $ * $ represents the convolution operation in the spatial region, and the symbol $ \ otimes $ represents the element product in the matrix operation. It should be noted that due to the nature of the element product, ** $ f and g $ must be close in size (or rather equal in size) ** in order to perform the operation of the above equation. However, in general, the input (size $ I_h \ times I_w $) and the filter (size $ F_h \ times F_w $) are completely different sizes. Therefore, when implementing it, padding is required to make the input and filter sizes uniform. (May be trimmed) For simplicity of discussion, let $ I_h = I_w = I $ and $ F_h = F_w = F $. We will also size the output to $ O_h \ times O_w $, which is also represented as $ O_h = O_w = O $. Also, set the number of input channels to $ C $, the number of filters to $ M $, and the size of the mini-batch to $ B $. The computational complexity of the convolution operation in the spatial area is no padding, stride $ 1 $

O^2 F^2 \quad (\because O = I - F + 1)

It will be. If the padding width is $ 2p $ and the stride is $ s $

O^2 F^2 \quad \left(\because O = \left\lceil \cfrac{I - F + 2p}{s} \right\rceil + 1 \right)

is not it.

On the other hand, the amount of calculation in the Fourier domain is $ 2C_ {order} O ^ 2 \ log {O} $ when the Fourier transform is performed by ** Fast Fourier Transformation (FFT) **, and the element product The amount of computation for is expressed as ** Note that it is a complex operation ** and $ 4O ^ 2 $. $ C_ {order} $ is a constant. Therefore, the sum of $ 2C_ {order} O ^ 2 \ log {O} + 4O ^ 2 $ is the computational complexity of the convolution operation in the Fourier region. It seems that $ O = I $ in the paper.

Each input channel and filter is independent, so you only need to perform one Fourier transform on each. Learning efficiency can be reduced for certain convolutions, but the FFT can be reused effectively, which can outweigh the overhead of reduced efficiency.

Computational complexity for forward and back propagation makes the advantage of convolution operations in the Fourier region more apparent.

	Spatial area	Fourier domain
\displaystyle\sum_{C}{x_C * F_{CM}}	BCMO^2F^2	2C_{order}I^2(BM + BC + CM)\log{I} + 4BCMI^2
$\cfrac{\partial L}{\partial y_M} * w^{\top}_{CM} $	BCMI^2F^2	2C_{order}O^2(BM + BC + CM)\log{O} + 4BCMO^2
\cfrac{\partial L}{\partial y_M} * x_C	BCMO^2F^2	2C_{order}I^2(BM + BC + CM)\log{I} + 4BCMI^2

It should be noted that the $ O $ here is $ O = I --F + 1 $ or $ O = \ left \ lceil \ cfrac {I --F + 2p} {s in both the spatial domain and the frequency domain. } \ right \ rceil + 1 $ means that it is calculated. Note that ** is different from the previous discussion, especially in the Fourier region **. Probably, it is arranged to make a correct comparison of computational complexity.

As can be read from this table, ** the complexity in the spatial region is represented by the product of 5 terms **, while the complexity in the Fourier region is represented by the product of up to 4 terms **. It means that it is represented. Roughly speaking, the order of complexity is 5th (more accurately 7th) in the spatial domain and 4th (more accurately 5th) in the Fourier domain. Let's actually calculate and see how much effect this will have on forward propagation.

B = 128 \quad C = 96 \\
M = 256 \quad F = 7

Assuming that $ I = 16, 64 $ is calculated, $ O = I --F + 1 = 10, 58 $, so about the spatial area

BCMO^2F^2 = 128 \times 96 \times 256 \times 10^2 \times 7^2 = 15,414,067,200 \simeq 15.4 billion\\
BCMO^2F^2 = 128 \times 96 \times 256 \times 58^2 \times 7^2 = 518,529,220,608 \simeq 518.5 billion

About the Fourier region

2C_{order}I^2(BM + BC + CM)\log{I} + 4BCMI^2 = 2 \times 16 \times 16^2 \times (128 \times 256 + 128 \times 96 + 96 \times 256) \log{16} + 4 \times 128 \times 96 \times 256 \times 16^2 = 1,581,554,875.654148 + 3,221,225,472 = 4,802,780,347.654148 \simeq 4.8 billion\\
2C_{order}I^2(BM + BC + CM)\log{I} + 4BCMI^2 = 2 \times 16 \times 64^2 \times (128 \times 256 + 128 \times 96 + 96 \times 256) \log{64} + 4 \times 128 \times 96 \times 256 \times 64^2 = 37,957,317,015.69955 + 51,539,607,552 = 89,496,924,567.69955 \simeq 89.5 billion

Since the paper does not show how much $ C_ {order} $ was calculated, we adopted $ C_ {order} = 16 $, which roughly matches the graph.

With that feeling, the difference will widen as the input size increases.

2.3 Implementation and spatial complexity

Although the algorithm is a very simple convolution operation in the Fourier domain, there are two major problems in implementation.

The FFT implemented in the current GPU, mainly CUDA compatible with NVIDIA GPUs, is optimized for a limited number of FFTs with large inputs, so it is not optimal for this algorithm.
A large amount of memory area (spatial complexity) is required to hold the Fourier transformed filter.

The following optimizations and countermeasures have been proposed for each problem.

Implemented [Cooley-Turkey FFT algorithm](# cooleyturkey-fft algorithm) --Parallelization is possible in the mini-batch channel direction --Originally, 2D FFT can be parallelized in the row and column directions. --From the above, it is possible to efficiently and quickly Fourier transform a small and large amount of input with this algorithm.
Memory saving by reusing (overwriting) the memory used for each convolution. Also, by utilizing the fact that the Fourier transform of the real-valued input becomes a symmetric matrix, only the lower triangular matrix (or upper triangular matrix) is retained. --Re-FFT is required when calculating backpropagation due to memory reuse --The amount of memory that should be retained by holding it in a triangular matrix is $ X ^ 2 \ Rightarrow \ frac {X (X + 1)} {2} $

This is the end of the theoretical aspect. The rest is the experimental results. I wanted you to post a little more concretely what kind of calculation is progressing ... I had some questions when I tried to implement it ... Well, that area is similar to normal CNN I wonder if it should behave like this.

3 Experiment

The experiment was conducted using NVIDIA's GeForce GTX Titan GPU, comparing the CUD Conv GPU implementation with the Torch 7 machine learning environment. The accuracy obtained in the experiment is not listed, but it seems that it was within the margin of error.

Focusing on the second layer of a CNN where the number of input channels is $ C = 96 $ and the number of output channels (that is, the number of filters) is $ M = 256 , the filter size, input size, and mini-batch size of that layer are changed. The experimental results are shown in the figure below. ![Fig3.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/640911/aa279394-e6f4-3d37-e73f-8cb828e813f5.png) In most cases, the convolution operation in the Fourier region has been shown to be fast. A more remarkable advantage is recognized especially in $ \ cfrac {\ partial L} {\ partial y_M} * x_C $$, which has a large amount of time complexity. It is highly likely that this is because you are using a filter with a large filter size. Also, before applying the FFT, ** the filter is padded to the same size as the input **, which means that you can use a larger filter.

Regarding this consideration, many of the current CNNs use a small filter size from the viewpoint of memory saving. Certainly, if the FCNN implementation remains as it is, it will consume a lot of memory anyway, so it may be ant to use a larger filter. By the way, the implementation as it is is unlikely to be used. Another memory-saving technology has been firmly established. I will write another article about that.

Next, for CNN, which was focused only on the second layer earlier, this time, focusing on the first to fifth layers, the result of the experiment by changing the mini-batch size $ B $ is shown in the figure below. .. No data has been collected because the gradient of the input layer to the input does not need to be calculated. It can be read that in most cases it boasts the fastest or second fastest speed. In addition, it can be said that FCNN is effective in almost all cases because it is the fastest in total. In addition, forward propagation is considerably faster, which suggests that ** the maximum benefit can be obtained when making inferences using a trained network **.

Finally, it is the experimental result when the pooling layer and the fully connected layer are added to the above CNN. This experiment is to see if implementation details such as padding and memory access change performance. FCNN also proved its superiority in this experiment.

4 About the future

I'll make a bullet point.

--Exploring ways to learn filters directly within the Fourier region --Exploring how to use nonlinear functions within the Fourier region --This eliminates the need for inverse conversion and makes it faster. --In the current implementation, the input image needs to be padded to a power of 2 size, so improve this. --Investigate the effect of increasing the filter size

The above is the content of the dissertation. It's a simple but powerful algorithm ~ It seems that various studies have been conducted since then.

Do you want to implement it?

I want to implement it ... but I have some doubts.

-Is it necessary to perform FFT / IFFT before and after the convolution operation? --In the schematic diagram of the paper, the number of input channels and the number of output channels are the same, but the number of output channels of the convolution layer in the spatial area should depend on the number of filters. —— Also, no matter which channel you choose to match, the problem is how to align it to that dimension.

Each of them is not specified in the paper ... It should be, but it can be read subtly, isn't it? For example, in [4 About the future](# 4-About the future)

--Explore ways to learn filters directly within the Fourier region --Exploring how to use nonlinear functions within the Fourier region --This eliminates the need for inverse conversion and makes it faster.

Because it says something like, it seems that FFT / IFFT is done only before and after the matrix product operation, or the number of channels changes in the same way as CNN in the last table of the experiment, I do not write directly However, there is a description that seems to be the case.

Also, I've been searching for and reading some later papers, but there was also a description of how to do FFT only once and execute it all in the Fourier region, so that area as well. Since I haven't studied enough including it, I will not implement it this time. I can't deny the lack of understanding about the Fourier transform.

appendix

As a bonus, I will post some mathematical things.

Strict convolution and CNN

Consider the following convolution of the input matrix $ \ boldsymbol {A} $ and the filter matrix $ \ boldsymbol {W} $. However, no padding, stride $ 1 $.

I'm sorry if the index of the subsequent matrix elements is wrong.

\boldsymbol{A} = \left( \begin{array}[ccc]
  aa_{1, 1} & a_{1, 2} & a_{1, 3} & a_{1, 4} \\
  a_{2, 1} & a_{2, 2} & a_{2, 3} & a_{2, 4} \\
  a_{3, 1} & a_{3, 2} & a_{3, 3} & a_{3, 4} \\
  a_{4, 1} & a_{4, 2} & a_{4, 3} & a_{4, 4}
\end{array} \right) \qquad
\boldsymbol{W} = \left( \begin{array}[ccc]
  ww_{1, 1} & w_{1, 2} \\
  w_{2, 1} & w_{2, 2} \\
\end{array} \right)

The convolution result of this is as follows.

\begin{align}
  \boldsymbol{AW} &= \left( \begin{array}[ccc]
    aa_{1, 1}w_{1, 1} + a_{1, 2}w_{1, 2} + a_{2, 1}w_{2, 1} + a_{2, 2}w_{2, 2}
  & a_{1, 2}w_{1, 1} + a_{1, 3}w_{1, 2} + a_{2, 2}w_{2, 1} + a_{2, 3}w_{2, 2}
  & a_{1, 3}w_{1, 1} + a_{1, 4}w_{1, 2} +   a_{2, 3}w_{2, 1} + a_{2, 4}w_{2, 2} \\
    a_{2, 1}w_{1, 1} + a_{2, 2}w_{1, 2} + a_{3, 1}w_{2, 1} + a_{3, 2}w_{2, 2}
  & a_{2, 2}w_{1, 1} + a_{2, 3}w_{1, 2} + a_{3, 2}w_{2, 1} + a_{2, 3}w_{2, 2}
  & a_{2, 3}w_{1, 1} + a_{2, 4}w_{1, 2} + a_{3, 3}w_{2, 1} + a_{2, 4}w_{2, 2} \\
    a_{3, 1}w_{1, 1} + a_{3, 2}w_{1, 2} + a_{4, 1}w_{2, 1} + a_{4, 2}w_{2, 2}
  & a_{3, 2}w_{1, 1} + a_{3, 3}w_{1, 2} + a_{4, 2}w_{2, 1} + a_{4, 3}w_{2, 2}
  & a_{3, 3}w_{1, 1} + a_{3, 4}w_{1, 2} + a_{4, 3}w_{2, 1} + a_{4, 4}w_{2, 2}
  \end{array} \right) \\
  &= \left( \begin{array}[ccc]
    s\displaystyle\sum_{k=1}^{2}{\displaystyle\sum_{l=1}^{2}{a_{k, l}w_{k, l}}}
  & \displaystyle\sum_{k=1}^{2}{\displaystyle\sum_{l=1}^{2}{a_{k, l+1}w_{k, l}}}
  & \displaystyle\sum_{k=1}^{2}{\displaystyle\sum_{l=1}^{2}{a_{k, l+2}w_{k, l}}} \\
    \displaystyle\sum_{k=1}^{2}{\displaystyle\sum_{l=1}^{2}{a_{k+1, l}w_{k, l}}} 
  & \displaystyle\sum_{k=1}^{2}{\displaystyle\sum_{l=1}^{2}{a_{k+1, l+1}w_{k, l}}}
  & \displaystyle\sum_{k=1}^{2}{\displaystyle\sum_{l=1}^{2}{a_{k+1, l+2}w_{k, l}}} \\
    \displaystyle\sum_{k=1}^{2}{\displaystyle\sum_{l=1}^{2}{a_{k+2, l}w_{k, l}}}
  & \displaystyle\sum_{k=1}^{2}{\displaystyle\sum_{l=1}^{2}{a_{k+2, l+1}w_{k, l}}}
  & \displaystyle\sum_{k=1}^{2}{\displaystyle\sum_{l=1}^{2}{a_{k+2, l+2}w_{k, l}}}
  \end{array} \right)
\end{align}

You can see the regularity. Let's generalize it. When the filter size is $ F_h \ times F_w $, the output $ (i, j) $ component $ o_ {i, j} $ is

o_{i, j} = \displaystyle\sum_{k=1}^{F_h}{\displaystyle\sum_{l=1}^{F_w}{a_{i+k-1, j+l-1}w_{k, l}}}

Can be written as. Please note that the order of addition of subscripts has been changed for later.

This is the convolution operation on the CNN. It's a circular convolution. By the way, mathematically, the two-dimensional discrete convolution operation is expressed by the following equation.

(g * f)_{i, j} = o_{i, j} = \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{i-k+1, j-l+1}g_{k, l}}}

Somehow, $ f $ is entered, $ g $ is regarded as a filter, and the order is reversed from the general formula. Also, $ k and l $ take all values for which the index is valid. At this point, those who think "that?" Are sharp. If you write this in a matrix (the sizes will be the same for CNN)

\begin{align}
  \boldsymbol{g*f} &=　\left( \begin{array}[ccc]
    s\displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{2-k, 2-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{2-k, 3-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{2-k, 4-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{2-k, 5-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{2-k, 6-l}g_{k, l}}} \\
    \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{3-k, 2-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{3-k, 3-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{3-k, 4-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{3-k, 5-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{3-k, 6-l}g_{k, l}}} \\
  \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{4-k, 2-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{4-k, 3-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{4-k, 4-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{4-k, 5-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{4-k, 6-l}g_{k, l}}} \\
  \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{5-k, 2-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{5-k, 3-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{5-k, 4-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{5-k, 5-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{5-k, 6-l}g_{k, l}}} \\
\displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{6-k, 2-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{6-k, 3-l}g_{k, l}}}
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{6-k, 4-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{6-k, 5-l}g_{k, l}}} 
  & \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{6-k, 6-l}g_{k, l}}} \\
  \end{array} \right) \\
  &= \left( \begin{array}[ccccc]
    ff_{1, 1}g_{1, 1}
  & f_{1, 2}g_{1, 1} + f_{1, 1}g_{1, 2}
  & f_{1, 3}g_{1, 1} + f_{1, 2}g_{1, 2}
  & f_{1, 4}g_{1, 1} + f_{1, 3}g_{1, 2}
  & f_{1, 4}g_{1, 2} \\
    f_{2, 1}g_{1, 1} + f_{1, 1}g_{2, 1}
  & f_{2, 2}g_{1, 1} + f_{2, 1}g_{1, 2} + f_{1, 2}g_{2, 1} + f_{1, 1}g_{2, 2}
  & f_{2, 3}g_{1, 1} + f_{2, 2}g_{1, 2} + f_{1, 3}g_{2, 1} + f_{1, 2}g_{2, 2}
  & f_{2, 4}g_{1, 1} + f_{2, 3}g_{1, 2} + f_{1, 4}g_{2, 1} + f_{1, 3}g_{2, 2}
  & f_{2, 4}g_{1, 2} + f_{1, 4}g_{2, 2} \\
    f_{3, 1}g_{1, 1} + f_{2, 1}g_{2, 1}
  & f_{3, 2}g_{1, 1} + f_{3, 1}g_{1, 2} + f_{2, 2}g_{2, 1} + f_{2, 1}g_{2, 2}
  & f_{3, 3}g_{1, 1} + f_{3, 2}g_{1, 2} + f_{2, 3}g_{2, 1} + f_{2, 2}g_{2, 2}
  & f_{3, 4}g_{1, 1} + f_{3, 3}g_{1, 2} + f_{2, 4}g_{2, 1} + f_{2, 3}g_{2, 2}
  & f_{3, 4}g_{1, 2} + f_{2, 4}g_{2, 2} \\
    f_{4, 1}g_{1, 1} + f_{3, 1}g_{2, 1}
  & f_{4, 2}g_{1, 1} + f_{4, 1}g_{1, 2} + f_{3, 2}g_{2, 1} + f_{3, 1}g_{2, 2}
  & f_{4, 3}g_{1, 1} + f_{4, 2}g_{1, 2} + f_{3, 3}g_{2, 1} + f_{3, 2}g_{2, 2}
  & f_{4, 4}g_{1, 1} + f_{4, 3}g_{1, 2} + f_{3, 4}g_{2, 1} + f_{3, 3}g_{2, 2}
  & f_{4, 4}g_{1, 2} + f_{3, 4}g_{2, 2} \\
    f_{4, 1}g_{2, 1}
  & f_{4, 2}g_{2, 1} + f_{4, 1}g_{2, 2}
  & f_{4, 3}g_{2, 1} + f_{4, 2}g_{2, 2}
  & f_{4, 4}g_{2, 1} + f_{4, 3}g_{2, 2}
  & f_{4, 4}g_{2, 2}
  \end{array} \right) \\
\end{align}

It will be.

As you can see at a glance, the convolution in CNN and the convolution defined in mathematics are different. This difference is the difference between circular convolution and linear convolution ... ** but it's actually different **. Even if you apply ** padding, it will not be the same ** as in the [How to equalize linear convolution and circular convolution](#How to equalize linear convolution and circular convolution) introduced earlier. I will actually try it.

\boldsymbol{A} = \left( \begin{array}[cccccc]
  00 & 0 & 0 & 0 & 0 & 0 \\
  0& a_{1, 1} & a_{1, 2} & a_{1, 3} & a_{1, 4} & 0 \\
  0 & a_{2, 1} & a_{2, 2} & a_{2, 3} & a_{2, 4} & 0 \\
  0 & a_{3, 1} & a_{3, 2} & a_{3, 3} & a_{3, 4} & 0 \\
  0 & a_{4, 1} & a_{4, 2} & a_{4, 3} & a_{4, 4} & 0 \\
  0 & 0 & 0 & 0 & 0 & 0
\end{array} \right)

\boldsymbol{AW} = \left( \begin{array}[ccccc]
    aa_{1, 1}w_{2, 2}
  & a_{1, 1}w_{2, 1} + a_{1, 2}w_{2, 2}
  & a_{1, 2}w_{2, 1} + a_{1, 3}w_{2, 2}
  & a_{1, 3}w_{2, 1} + a_{1, 4}w_{2, 2} 
  & a_{1, 4}w_{2, 1} \\
    a_{1, 1}w_{1, 2} + a_{2, 1}w_{2, 2}
  & a_{1, 1}w_{1, 1} + a_{1, 2}w_{1, 2} + a_{2, 1}w_{2, 1} + a_{2, 2}w_{2, 2}
  & a_{1, 2}w_{1, 1} + a_{1, 3}w_{1, 2} + a_{2, 2}w_{2, 1} + a_{2, 3}w_{2, 2}
  & a_{1, 3}w_{1, 1} + a_{1, 4}w_{1, 2} + a_{2, 3}w_{2, 1} + a_{2, 4}w_{2, 2}
  & a_{1, 4}w_{1, 1} + a_{2, 4}w_{2, 1} \\
    a_{2, 1}w_{1, 2} + a_{3, 1}w_{2, 2}
  & a_{2, 1}w_{1, 1} + a_{2, 2}w_{1, 2} + a_{3, 1}w_{2, 1} + a_{3, 2}w_{2, 2}
  & a_{2, 2}w_{1, 1} + a_{2, 3}w_{1, 2} + a_{3, 2}w_{2, 1} + a_{3, 3}w_{2, 2}
  & a_{2, 3}w_{1, 1} + a_{2, 4}w_{1, 2} + a_{3, 3}w_{2, 1} + a_{3, 4}w_{2, 2}
  & a_{2, 4}w_{1, 1} + a_{3, 4}w_{2, 1} \\
    a_{3, 1}w_{1, 2} + a_{4, 1}w_{2, 2}
  & a_{3, 1}w_{1, 1} + a_{3, 2}w_{1, 2} + a_{4, 1}w_{2, 1} + a_{4, 2}w_{2, 2}
  & a_{3, 2}w_{1, 1} + a_{3, 3}w_{1, 2} + a_{4, 2}w_{2, 1} + a_{4, 3}w_{2, 2}
  & a_{3, 3}w_{1, 1} + a_{3, 4}w_{1, 2} + a_{4, 3}w_{2, 1} + a_{4, 4}w_{2, 2}
  & a_{3, 4}w_{1, 1} + a_{4, 4}w_{2, 1} \\
    a_{4, 1}w_{1, 2}
  & a_{4, 1}w_{1, 1} + a_{4, 2}w_{1, 2}
  & a_{4, 2}w_{1, 1} + a_{4, 3}w_{1, 2}
  & a_{4, 3}w_{1, 1} + a_{4, 4}w_{1, 2}
  & a_{4, 4}w_{1, 1}
\end{array} \right)

It will be. The shapes match, the number of additions for each element is the same, but the indexes are different. This means that ** CNN convolution is a separate operation from the mathematically defined convolution **.

This was a glimpse in the convolution of this article

When it comes to theoretically rigorous discussions, it's actually different from the premise.

That is. In the first place ** the premise that "CNN convolution is a mathematical convolution" was wrong **.

You may be wondering, "So what kind of definition is mathematically the convolution of CNN, or is there a corresponding mathematical definition?", But there is a proper mathematical definition for this. ** Mutual relationship **.

(g \star f)_{i, j} = o_{i, j} = \displaystyle\sum_{k}{\displaystyle\sum_{l}{f_{i+k-1, j+l-1}g_{k, l}}}

As with the mathematical convolution, $ k and l $ take all values that are valid indexes. This is a definition similar to CNN convolution, so you can imagine the same result.

This cross-correlation calculates how similar the two signals ($ f $ and $ g $) are. Since it is not so relevant, I will leave it to the reference site for details, but in other words, the CNN filter is similar to the specific pattern of input. You are learning the distribution function that you are using.

The important thing here is ** whether we can somehow bring the cross-correlation into a mathematical convolution **. The process of convolution, which used to be the norm, was not actually convolution, but it would cover the basis of the discussion, and it would be necessary to reconsider everything. Researchers who have been thinking hard about interpreting filters will cry ...

Let's compare the mathematical convolution with the CNN convolution when padding.

\left( \begin{array}[ccccc]
  ff_{1, 1}g_{1, 1}
& f_{1, 2}g_{1, 1} + f_{1, 1}g_{1, 2}
& f_{1, 3}g_{1, 1} + f_{1, 2}g_{1, 2}
& f_{1, 4}g_{1, 1} + f_{1, 3}g_{1, 2}
& f_{1, 4}g_{1, 2} \\
  f_{2, 1}g_{1, 1} + f_{1, 1}g_{2, 1}
& f_{2, 2}g_{1, 1} + f_{2, 1}g_{1, 2} + f_{1, 2}g_{2, 1} + f_{1, 1}g_{2, 2}
& f_{2, 3}g_{1, 1} + f_{2, 2}g_{1, 2} + f_{1, 3}g_{2, 1} + f_{1, 2}g_{2, 2}
& f_{2, 4}g_{1, 1} + f_{2, 3}g_{1, 2} + f_{1, 4}g_{2, 1} + f_{1, 3}g_{2, 2}
& f_{2, 4}g_{1, 2} + f_{1, 4}g_{2, 2} \\
  f_{3, 1}g_{1, 1} + f_{2, 1}g_{2, 1}
& f_{3, 2}g_{1, 1} + f_{3, 1}g_{1, 2} + f_{2, 2}g_{2, 1} + f_{2, 1}g_{2, 2}
& f_{3, 3}g_{1, 1} + f_{3, 2}g_{1, 2} + f_{2, 3}g_{2, 1} + f_{2, 2}g_{2, 2}
& f_{3, 4}g_{1, 1} + f_{3, 3}g_{1, 2} + f_{2, 4}g_{2, 1} + f_{2, 3}g_{2, 2}
& f_{3, 4}g_{1, 2} + f_{2, 4}g_{2, 2} \\
  f_{4, 1}g_{1, 1} + f_{3, 1}g_{2, 1}
& f_{4, 2}g_{1, 1} + f_{4, 1}g_{1, 2} + f_{3, 2}g_{2, 1} + f_{3, 1}g_{2, 2}
& f_{4, 3}g_{1, 1} + f_{4, 2}g_{1, 2} + f_{3, 3}g_{2, 1} + f_{3, 2}g_{2, 2}
& f_{4, 4}g_{1, 1} + f_{4, 3}g_{1, 2} + f_{3, 4}g_{2, 1} + f_{3, 3}g_{2, 2}
& f_{4, 4}g_{1, 2} + f_{3, 4}g_{2, 2} \\
  f_{4, 1}g_{2, 1}
& f_{4, 2}g_{2, 1} + f_{4, 1}g_{2, 2}
& f_{4, 3}g_{2, 1} + f_{4, 2}g_{2, 2}
& f_{4, 4}g_{2, 1} + f_{4, 3}g_{2, 2}
& f_{4, 4}g_{2, 2}
\end{array} \right) \\
\left( \begin{array}[ccccc]
    aa_{1, 1}w_{2, 2}
  & a_{1, 1}w_{2, 1} + a_{1, 2}w_{2, 2}
  & a_{1, 2}w_{2, 1} + a_{1, 3}w_{2, 2}
  & a_{1, 3}w_{2, 1} + a_{1, 4}w_{2, 2} 
  & a_{1, 4}w_{2, 1} \\
    a_{1, 1}w_{1, 2} + a_{2, 1}w_{2, 2}
  & a_{1, 1}w_{1, 1} + a_{1, 2}w_{1, 2} + a_{2, 1}w_{2, 1} + a_{2, 2}w_{2, 2}
  & a_{1, 2}w_{1, 1} + a_{1, 3}w_{1, 2} + a_{2, 2}w_{2, 1} + a_{2, 3}w_{2, 2}
  & a_{1, 3}w_{1, 1} + a_{1, 4}w_{1, 2} + a_{2, 3}w_{2, 1} + a_{2, 4}w_{2, 2}
  & a_{1, 4}w_{1, 1} + a_{2, 4}w_{2, 1} \\
    a_{2, 1}w_{1, 2} + a_{3, 1}w_{2, 2}
  & a_{2, 1}w_{1, 1} + a_{2, 2}w_{1, 2} + a_{3, 1}w_{2, 1} + a_{3, 2}w_{2, 2}
  & a_{2, 2}w_{1, 1} + a_{2, 3}w_{1, 2} + a_{3, 2}w_{2, 1} + a_{3, 3}w_{2, 2}
  & a_{2, 3}w_{1, 1} + a_{2, 4}w_{1, 2} + a_{3, 3}w_{2, 1} + a_{3, 4}w_{2, 2}
  & a_{2, 4}w_{1, 1} + a_{3, 4}w_{2, 1} \\
    a_{3, 1}w_{1, 2} + a_{4, 1}w_{2, 2}
  & a_{3, 1}w_{1, 1} + a_{3, 2}w_{1, 2} + a_{4, 1}w_{2, 1} + a_{4, 2}w_{2, 2}
  & a_{3, 2}w_{1, 1} + a_{3, 3}w_{1, 2} + a_{4, 2}w_{2, 1} + a_{4, 3}w_{2, 2}
  & a_{3, 3}w_{1, 1} + a_{3, 4}w_{1, 2} + a_{4, 3}w_{2, 1} + a_{4, 4}w_{2, 2}
  & a_{3, 4}w_{1, 1} + a_{4, 4}w_{2, 1} \\
    a_{4, 1}w_{1, 2}
  & a_{4, 1}w_{1, 1} + a_{4, 2}w_{1, 2}
  & a_{4, 2}w_{1, 1} + a_{4, 3}w_{1, 2}
  & a_{4, 3}w_{1, 1} + a_{4, 4}w_{1, 2}
  & a_{4, 4}w_{1, 1}
\end{array} \right)

Doesn't it feel cool? For example

\left( \begin{array}[ccc]
  gg_{1, 1} & g_{1, 2} \\
  g_{2, 1} & g_{2, 2} \\
\end{array} \right)
= \left( \begin{array}[ccc]
  ww_{2, 2} & w_{2, 1} \\
  w_{1, 2} & w_{1, 1} \\
\end{array} \right)

What do you do

\left( \begin{array}[ccccc]
  ff_{1, 1}g_{1, 1}
& f_{1, 2}g_{1, 1} + f_{1, 1}g_{1, 2}
& f_{1, 3}g_{1, 1} + f_{1, 2}g_{1, 2}
& f_{1, 4}g_{1, 1} + f_{1, 3}g_{1, 2}
& f_{1, 4}g_{1, 2} \\
  f_{2, 1}g_{1, 1} + f_{1, 1}g_{2, 1}
& f_{2, 2}g_{1, 1} + f_{2, 1}g_{1, 2} + f_{1, 2}g_{2, 1} + f_{1, 1}g_{2, 2}
& f_{2, 3}g_{1, 1} + f_{2, 2}g_{1, 2} + f_{1, 3}g_{2, 1} + f_{1, 2}g_{2, 2}
& f_{2, 4}g_{1, 1} + f_{2, 3}g_{1, 2} + f_{1, 4}g_{2, 1} + f_{1, 3}g_{2, 2}
& f_{2, 4}g_{1, 2} + f_{1, 4}g_{2, 2} \\
  f_{3, 1}g_{1, 1} + f_{2, 1}g_{2, 1}
& f_{3, 2}g_{1, 1} + f_{3, 1}g_{1, 2} + f_{2, 2}g_{2, 1} + f_{2, 1}g_{2, 2}
& f_{3, 3}g_{1, 1} + f_{3, 2}g_{1, 2} + f_{2, 3}g_{2, 1} + f_{2, 2}g_{2, 2}
& f_{3, 4}g_{1, 1} + f_{3, 3}g_{1, 2} + f_{2, 4}g_{2, 1} + f_{2, 3}g_{2, 2}
& f_{3, 4}g_{1, 2} + f_{2, 4}g_{2, 2} \\
  f_{4, 1}g_{1, 1} + f_{3, 1}g_{2, 1}
& f_{4, 2}g_{1, 1} + f_{4, 1}g_{1, 2} + f_{3, 2}g_{2, 1} + f_{3, 1}g_{2, 2}
& f_{4, 3}g_{1, 1} + f_{4, 2}g_{1, 2} + f_{3, 3}g_{2, 1} + f_{3, 2}g_{2, 2}
& f_{4, 4}g_{1, 1} + f_{4, 3}g_{1, 2} + f_{3, 4}g_{2, 1} + f_{3, 3}g_{2, 2}
& f_{4, 4}g_{1, 2} + f_{3, 4}g_{2, 2} \\
  f_{4, 1}g_{2, 1}
& f_{4, 2}g_{2, 1} + f_{4, 1}g_{2, 2}
& f_{4, 3}g_{2, 1} + f_{4, 2}g_{2, 2}
& f_{4, 4}g_{2, 1} + f_{4, 3}g_{2, 2}
& f_{4, 4}g_{2, 2}
\end{array} \right) \\
= \left( \begin{array}[ccccc]
  ff_{1, 1}w_{2, 2}
& f_{1, 2}w_{2, 2} + f_{1, 1}w_{2, 1}
& f_{1, 3}w_{2, 2} + f_{1, 2}w_{2, 1}
& f_{1, 4}w_{2, 2} + f_{1, 3}w_{2, 1}
& f_{1, 4}w_{2, 1} \\
  f_{2, 1}w_{2, 2} + f_{1, 1}w_{1, 2}
& f_{2, 2}w_{2, 2} + f_{2, 1}w_{2, 1} + f_{1, 2}w_{1, 2} + f_{1, 1}w_{1, 1}
& f_{2, 3}w_{2, 2} + f_{2, 2}w_{2, 1} + f_{1, 3}w_{1, 2} + f_{1, 2}w_{1, 1}
& f_{2, 4}w_{2, 2} + f_{2, 3}w_{2, 1} + f_{1, 4}w_{1, 2} + f_{1, 3}w_{1, 1}
& f_{2, 4}w_{2, 1} + f_{1, 4}w_{1, 1} \\
  f_{3, 1}w_{2, 2} + f_{2, 1}w_{1, 2}
& f_{3, 2}w_{2, 2} + f_{3, 1}w_{2, 1} + f_{2, 2}w_{1, 2} + f_{2, 1}w_{1, 1}
& f_{3, 3}w_{2, 2} + f_{3, 2}w_{2, 1} + f_{2, 3}w_{1, 2} + f_{2, 2}w_{1, 1}
& f_{3, 4}w_{2, 2} + f_{3, 3}w_{2, 1} + f_{2, 4}w_{1, 2} + f_{2, 3}w_{1, 1}
& f_{3, 4}w_{2, 1} + f_{2, 4}w_{1, 1} \\
  f_{4, 1}w_{2, 2} + f_{3, 1}w_{1, 2}
& f_{4, 2}w_{2, 2} + f_{4, 1}w_{2, 1} + f_{3, 2}w_{1, 2} + f_{3, 1}w_{1, 1}
& f_{4, 3}w_{2, 2} + f_{4, 2}w_{2, 1} + f_{3, 3}w_{1, 2} + f_{3, 2}w_{1, 1}
& f_{4, 4}w_{2, 2} + f_{4, 3}w_{2, 1} + f_{3, 4}w_{1, 2} + f_{3, 3}w_{1, 1}
& f_{4, 4}w_{2, 1} + f_{3, 4}w_{1, 1} \\
  f_{4, 1}w_{1, 2}
& f_{4, 2}w_{1, 2} + f_{4, 1}w_{1, 1}
& f_{4, 3}w_{1, 2} + f_{4, 2}w_{1, 1}
& f_{4, 4}w_{1, 2} + f_{4, 3}w_{1, 1}
& f_{4, 4}w_{1, 1}
\end{array} \right) \\
\underbrace{=}_{Change the order of addition}
\left( \begin{array}[ccccc]
    ff_{1, 1}w_{2, 2}
  & f_{1, 1}w_{2, 1} + f_{1, 2}w_{2, 2}
  & f_{1, 2}w_{2, 1} + f_{1, 3}w_{2, 2}
  & f_{1, 3}w_{2, 1} + f_{1, 4}w_{2, 2} 
  & f_{1, 4}w_{2, 1} \\
    f_{1, 1}w_{1, 2} + f_{2, 1}w_{2, 2}
  & f_{1, 1}w_{1, 1} + f_{1, 2}w_{1, 2} + f_{2, 1}w_{2, 1} + f_{2, 2}w_{2, 2}
  & f_{1, 2}w_{1, 1} + f_{1, 3}w_{1, 2} + f_{2, 2}w_{2, 1} + f_{2, 3}w_{2, 2}
  & f_{1, 3}w_{1, 1} + f_{1, 4}w_{1, 2} + f_{2, 3}w_{2, 1} + f_{2, 4}w_{2, 2}
  & f_{1, 4}w_{1, 1} + f_{2, 4}w_{2, 1} \\
    f_{2, 1}w_{1, 2} + f_{3, 1}w_{2, 2}
  & f_{2, 1}w_{1, 1} + f_{2, 2}w_{1, 2} + f_{3, 1}w_{2, 1} + f_{3, 2}w_{2, 2}
  & f_{2, 2}w_{1, 1} + f_{2, 3}w_{1, 2} + f_{3, 2}w_{2, 1} + f_{3, 3}w_{2, 2}
  & f_{2, 3}w_{1, 1} + f_{2, 4}w_{1, 2} + f_{3, 3}w_{2, 1} + f_{3, 4}w_{2, 2}
  & f_{2, 4}w_{1, 1} + f_{3, 4}w_{2, 1} \\
    f_{3, 1}w_{1, 2} + f_{4, 1}w_{2, 2}
  & f_{3, 1}w_{1, 1} + f_{3, 2}w_{1, 2} + f_{4, 1}w_{2, 1} + f_{4, 2}w_{2, 2}
  & f_{3, 2}w_{1, 1} + f_{3, 3}w_{1, 2} + f_{4, 2}w_{2, 1} + f_{4, 3}w_{2, 2}
  & f_{3, 3}w_{1, 1} + f_{3, 4}w_{1, 2} + f_{4, 3}w_{2, 1} + f_{4, 4}w_{2, 2}
  & f_{3, 4}w_{1, 1} + f_{4, 4}w_{2, 1} \\
    f_{4, 1}w_{1, 2}
  & f_{4, 1}w_{1, 1} + f_{4, 2}w_{1, 2}
  & f_{4, 2}w_{1, 1} + f_{4, 3}w_{1, 2}
  & f_{4, 3}w_{1, 1} + f_{4, 4}w_{1, 2}
  & f_{4, 4}w_{1, 1}
\end{array} \right) \\
\Leftrightarrow
\left( \begin{array}[ccccc]
    aa_{1, 1}w_{2, 2}
  & a_{1, 1}w_{2, 1} + a_{1, 2}w_{2, 2}
  & a_{1, 2}w_{2, 1} + a_{1, 3}w_{2, 2}
  & a_{1, 3}w_{2, 1} + a_{1, 4}w_{2, 2} 
  & a_{1, 4}w_{2, 1} \\
    a_{1, 1}w_{1, 2} + a_{2, 1}w_{2, 2}
  & a_{1, 1}w_{1, 1} + a_{1, 2}w_{1, 2} + a_{2, 1}w_{2, 1} + a_{2, 2}w_{2, 2}
  & a_{1, 2}w_{1, 1} + a_{1, 3}w_{1, 2} + a_{2, 2}w_{2, 1} + a_{2, 3}w_{2, 2}
  & a_{1, 3}w_{1, 1} + a_{1, 4}w_{1, 2} + a_{2, 3}w_{2, 1} + a_{2, 4}w_{2, 2}
  & a_{1, 4}w_{1, 1} + a_{2, 4}w_{2, 1} \\
    a_{2, 1}w_{1, 2} + a_{3, 1}w_{2, 2}
  & a_{2, 1}w_{1, 1} + a_{2, 2}w_{1, 2} + a_{3, 1}w_{2, 1} + a_{3, 2}w_{2, 2}
  & a_{2, 2}w_{1, 1} + a_{2, 3}w_{1, 2} + a_{3, 2}w_{2, 1} + a_{3, 3}w_{2, 2}
  & a_{2, 3}w_{1, 1} + a_{2, 4}w_{1, 2} + a_{3, 3}w_{2, 1} + a_{3, 4}w_{2, 2}
  & a_{2, 4}w_{1, 1} + a_{3, 4}w_{2, 1} \\
    a_{3, 1}w_{1, 2} + a_{4, 1}w_{2, 2}
  & a_{3, 1}w_{1, 1} + a_{3, 2}w_{1, 2} + a_{4, 1}w_{2, 1} + a_{4, 2}w_{2, 2}
  & a_{3, 2}w_{1, 1} + a_{3, 3}w_{1, 2} + a_{4, 2}w_{2, 1} + a_{4, 3}w_{2, 2}
  & a_{3, 3}w_{1, 1} + a_{3, 4}w_{1, 2} + a_{4, 3}w_{2, 1} + a_{4, 4}w_{2, 2}
  & a_{3, 4}w_{1, 1} + a_{4, 4}w_{2, 1} \\
    a_{4, 1}w_{1, 2}
  & a_{4, 1}w_{1, 1} + a_{4, 2}w_{1, 2}
  & a_{4, 2}w_{1, 1} + a_{4, 3}w_{1, 2}
  & a_{4, 3}w_{1, 1} + a_{4, 4}w_{1, 2}
  & a_{4, 4}w_{1, 1}
\end{array} \right)

And became the same! What this means is that ** cross-correlation and mathematical convolution can hopefully be the same operation **. In this case, "do well" is index inversion if discrete, plus or minus inversion if continuous.

So you can think of CNN's filters as learning an inverted convolution filter, and researchers don't have to cry.

Cooley-Tukey FFT algorithm

First, let's check the definition of the discrete Fourier transform in the first place. For simplicity, consider a one-dimensional vector $ \ boldsymbol {x} $ with a length of $ N $ as a discrete Fourier transform.

X_k = \mathcal{F}_{\boldsymbol{x}}(k) = \sum_{p=0}^{N-1}{x_p e^{-j\frac{2 \pi}{N}kp}} = \sum_{p=0}^{N-1}{x_p W_N^{kp}} \quad k = 0, 1, \ldots N-1 \\
W_N^{kp} = e^{-j\frac{2 \pi}{N}}

It will be. Here, the imaginary symbol is $ j $. Also, $ W_N = e ^ {-j \ frac {2 \ pi} {N}} $ is called ** turn factor **. The reason will be described later. This complexity is $ \ mathcal {O} (N ^ 2) $, but this calculation can be speeded up assuming $ N $ is even. The algorithm is called ** FFT: Fast Fourier Transformation ** and can be expressed as:

\begin{align}
  X_{2k} = \mathcal{F}_{\boldsymbol{x}}(2k) &= \sum_{p=0}^{N-1}{x_p W_N^{2kp}} & \\
  &= \sum_{p=0}^{N-1}{x_p W_{N/2}^{kp}} & (\because W_N^{2kp} = e^{-j\frac{2 \pi}{N}2kp} = e^{-j\frac{2 \pi}{N/2}kp} = W_{N/2}^{kp}) \\
  &= \sum_{p=0}^{N/2-1}{x_p W_{N/2}^{kp}} + \sum_{p=N/2}^{N-1}{x_p W_{N/2}^{kp}} & (\because divided into the first half and the second half, W_{N/2}^{kp}Number of elements\frac{N}{2}To match) \\
  &= \sum_{p=0}^{N/2-1}{x_p W_{N/2}^{kp}} + \sum_{p=0}^{N/2-1}{x_{p+N/2} W_{N/2}^{k(p+N/2)}} & (\because the sum of the latter half is p=Shift to 0 start) \\
  &= \sum_{p=0}^{N/2-1}{x_p W_{N/2}^{kp}} + \sum_{p=0}^{N/2-1}{x_{p+N/2} W_{N/2}^{kp}} & (\because W_{N/2}^{k(N/2)} = e^{-j\frac{2 \pi}{N/2}k(N/2)} = e^{-j2k \pi} = 1) \\
  &= \sum_{p=0}^{N/2-1}{(x_p + x_{p+N/2}) W_{N/2}^{kp}} \quad k = 0, 1, \ldots, \cfrac{N}{2} & \\

  X_{2k+1} = \mathcal{F}_{\boldsymbol{x}}(2k+1) &= \sum_{p=0}^{N-1}{x_p W_N^{(2k+1)p}} & \\
  &= \sum_{p=0}^{N-1}{x_p W_{N}^{p}W_{N/2}^{kp}} & \\
  &= \sum_{p=0}^{N/2-1}{x_p W_{N}^{p}W_{N/2}^{kp}} + \sum_{p=N/2}^{N-1}{x_p W_{N}^{p}W_{N/2}^{kp}} & \\
  &= \sum_{p=0}^{N/2-1}{x_p W_{N}^{p}W_{N/2}^{kp}} + \sum_{p=0}^{N/2-1}{x_{p+N/2} W_{N}^{p+N/2}W_{N/2}^{k(p+N/2)}} & \\
  &= \sum_{p=0}^{N/2-1}{x_p W_{N}^{p}W_{N/2}^{kp}} - \sum_{p=0}^{N/2-1}{x_{p+N/2} W_{N}^{p}W_{N/2}^{kp}} & \\
  &= \sum_{p=0}^{N/2-1}{(x_p - x_{p+N/2}) W_{N}^{p} W_{N/2}^{kp}} \quad k = 0, 1, \ldots, \cfrac{N}{2}-1 &
\end{align}

By definition, $ W $ is called a rotation factor because it represents a point on the unit circle whose rotation angle is ** clockwise ** and $ \ frac {2 \ pi} {N} $. .. Therefore, the following relational expression holds.

W_{2n}^{kn} = e^{-j\frac{2 \pi}{2n}kn} = e^{-j k \pi} = (-1)^k

In short, this relational expression is the value when the rotation angle is $ k \ pi $. This formula is used everywhere to transform the FFT formula.

To a person named Nanikore? If you are not familiar with complex numbers, you may not know what you are doing, so I will introduce Euler's formula, which is said to be the most beautiful in the history of mathematics. Before that, Taylor expansion is also necessary, but I will omit it because the story will expand infinitely if you dig up that area.

e^{jx} = \cos x + j \sin x

This is Euler's formula. obtained by substituting $ x = \ pi $ for this

e^{j \pi} = \cos \pi + j \sin \pi \Leftrightarrow e^{j \pi} = -1 + j0 \\
\therefore e^{j \pi} + 1 = 0

Is often described as the most beautiful formula in the history of mathematics. Unfortunately, I don't really understand the sensibilities of that area, but ... I understand that it's amazing. Anyway, if you substitute $ x = -k \ pi $ into this Euler's formula,

\begin{align}
  e^{j(-k \pi)} &= \cos (-k \pi) + j \sin (-k \pi) \\
  &= (-1)^k \\
\end{align}

And the above relational expression is obtained.

This FFT algorithm halves the amount of computation. And the extension of this operation is ** Cooley-Tukey's FFT algorithm **. Cooley_Turkey's FFT algorithm assumes that the vector length $ N $ is a power of 2, and iteratively adapts the FFT for even faster speeds. When the vector length becomes $ 1 $ by repeating the division, all the Fourier transform values can be obtained by finding the Fourier transform values.

Let's calculate concretely ** * Please let me know if you make a mistake! ** ** Let's calculate and confirm it concretely. First, check from the FFT. Suppose the input is a vector with $ \ boldsymbol {x} = (x_0, x_1, \ ldots, x_7) $ length $ N = 8 $. If the Fourier transform vector of this is $ \ boldsymbol {X} = (X_0, X_1, \ ldots, X_7) $, first

as defined.

\begin{align}
  \boldsymbol{X}^\top &= \left( \begin{array}[c]
    d\displaystyle\sum_{p=0}^{7}{x_p W_8^{0}} \\
    \displaystyle\sum_{p=0}^{7}{x_p W_8^{p}} \\
    \displaystyle\sum_{p=0}^{7}{x_p W_8^{2p}} \\
    \displaystyle\sum_{p=0}^{7}{x_p W_8^{3p}} \\
    \displaystyle\sum_{p=0}^{7}{x_p W_8^{4p}} \\
    \displaystyle\sum_{p=0}^{7}{x_p W_8^{5p}} \\
    \displaystyle\sum_{p=0}^{7}{x_p W_8^{6p}} \\
    \displaystyle\sum_{p=0}^{7}{x_p W_8^{7p}} \\
  \end{array}\right)
  = \left( \begin{array}[c]
    xx_0W_8^0 + x_1W_8^0 + x_2W_8^0 + x_3W_8^0 + x_4W_8^0 + x_5W_8^0 + x_6W_8^0 + x_7W_8^0 \\
    x_0W_8^0 + x_1W_8^1 + x_2W_8^2 + x_3W_8^3 + x_4W_8^4 + x_5W_8^5 + x_6W_8^6 + x_7W_8^7 \\
    x_0W_8^0 + x_1W_8^2 + x_2W_8^4 + x_3W_8^6 + x_4W_8^8 + x_5W_8^{10} + x_6W_8^{12} + x_7W_8^{14} \\
    x_0W_8^0 + x_1W_8^3 + x_2W_8^6 + x_3W_8^9 + x_4W_8^{12} + x_5W_8^{15} + x_6W_8^{18} + x_7W_8^{21} \\
    x_0W_8^0 + x_1W_8^4 + x_2W_8^8 + x_3W_8^{12} + x_4W_8^{16} + x_5W_8^{20} + x_6W_8^{24} + x_7W_8^{28} \\
    x_0W_8^0 + x_1W_8^5 + x_2W_8^{10} + x_3W_8^{15} + x_4W_8^{20} + x_5W_8^{25} + x_6W_8^{30} + x_7W_8^{35} \\
    x_0W_8^0 + x_1W_8^6 + x_2W_8^{12} + x_3W_8^{18} + x_4W_8^{24} + x_5W_8^{30} + x_6W_8^{36} + x_7W_8^{42} \\
    x_0W_8^0 + x_1W_8^7 + x_2W_8^{14} + x_3W_8^{21} + x_4W_8^{28} + x_5W_8^{35} + x_6W_8^{42} + x_7W_8^{49} \\
  \end{array}\right) \\
  &= \left( \begin{array}[cccccccc]
    WW_8^0 & W_8^0 & W_8^0 & W_8^0 & W_8^0 & W_8^0 & W_8^0 & W_8^0 \\
    W_8^0 & W_8^1 & W_8^2 & W_8^3 & W_8^4 & W_8^5 & W_8^6 & W_8^7 \\
    W_8^0 & W_8^2 & W_8^4 & W_8^6 & W_8^8 & W_8^{10} & W_8^{12} & W_8^{14} \\
    W_8^0 & W_8^3 & W_8^6 & W_8^9 & W_8^{12} & W_8^{15} & W_8^{18} & W_8^{21} \\
    W_8^0 & W_8^4 & W_8^8 & W_8^{12} & W_8^{16} & W_8^{20} & W_8^{24} & W_8^{28} \\
    W_8^0 & W_8^5 & W_8^{10} & W_8^{15} & W_8^{20} & W_8^{25} & W_8^{30} & W_8^{35} \\
    W_8^0 & W_8^6 & W_8^{12} & W_8^{18} & W_8^{24} & W_8^{30} & W_8^{36} & W_8^{42} \\
    W_8^0 & W_8^7 & W_8^{14} & W_8^{21} & W_8^{28} & W_8^{35} & W_8^{42} & W_8^{49} \\
  \end{array}\right)
  \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7 \\
  \end{array}\right) \\
  &= \left( \begin{array}[cccccccc]
    11 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
    1 & e^{-j\frac{\pi}{4}} & -j & e^{-j\frac{3\pi}{4}} & -1 & e^{-j\frac{5\pi}{4}} & j & e^{-j\frac{7\pi}{4}} \\
    1 & -j & -1 & j & 1 & -j & -1 & j \\
    1 & e^{-j\frac{3\pi}{4}} & j & e^{-j\frac{\pi}{4}} & -1 & e^{-j\frac{7\pi}{4}} & -j & e^{-j\frac{5\pi}{4}} \\
    1 & -1 & 1 & -1 & 1 & -1 & 1 & -1 \\
    1 & e^{-j\frac{5\pi}{4}} & -j & e^{-j\frac{7\pi}{4}} & -1 & e^{-j\frac{\pi}{4}} & j & e^{-j\frac{3\pi}{4}} \\
    1 & j & -1 & -j & 1 & j & -1 & -j \\
    1 & e^{-j\frac{7\pi}{4}} & j & e^{-j\frac{5\pi}{4}} & -1 & e^{-j\frac{3\pi}{4}} & -j & e^{-j\frac{\pi}{4}} \\
  \end{array}\right)
  \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7 \\
  \end{array}\right)
\end{align}

Can be written as. We will calculate while checking if it matches this.

\begin{align}
  X_{2k} &= \sum_{p=0}^{3}{(x_p + x_{p+4}) W_{4}^{kp}} \\
  X_{2k+1} &= \sum_{p=0}^{3}{(x_p - x_{p+4})W_{8}^{p} W_{4}^{kp}}
\end{align}

\begin{align}
  X^{(0)} &= (X_0, X_2, X_4, X_6) \\
  X^{(1)} &= (X_1, X_3, X_5, X_7)
\end{align}

as </ div>

\begin{align}
  X^{(0)\top} &= \left( \begin{array}[c]
    XX_0 \\
    X_2 \\
    X_4 \\
    X_6
  \end{array} \right) = \left( \begin{array}[c]
    d\displaystyle\sum_{p=0}^{3}{(x_p + x_{p+4}) W_4^{0}} \\
    \displaystyle\sum_{p=0}^{3}{(x_p + x_{p+4}) W_4^{p}} \\
    \displaystyle\sum_{p=0}^{3}{(x_p + x_{p+4}) W_4^{2p}} \\
    \displaystyle\sum_{p=0}^{3}{(x_p + x_{p+4}) W_4^{3p}}
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    ((x_0 + x_4) W_4^0 + (x_1 + x_5) W_4^0 + (x_2 + x_6) W_4^0 + (x_3 + x_7) W_4^0 \\
    (x_0 + x_4) W_4^0 + (x_1 + x_5) W_4^1 + (x_2 + x_6) W_4^2 + (x_3 + x_7) W_4^3 \\
    (x_0 + x_4) W_4^0 + (x_1 + x_5) W_4^2 + (x_2 + x_6) W_4^4 + (x_3 + x_7) W_4^6 \\
    (x_0 + x_4) W_4^0 + (x_1 + x_5) W_4^3 + (x_2 + x_6) W_4^6 + (x_3 + x_7) W_4^9 \\
  \end{array} \right) & \\
　&= \left( \begin{array}[c]
    xx_0 W_4^0 + x_1 W_4^0 + x_2 W_4^0 + x_3 W_4^0 + x_4 W_4^0 + x_5 W_4^0 + x_6 W_4^0 + x_7 W_4^0 \\
    x_0 W_4^0 + x_1 W_4^1 + x_2 W_4^2 + x_3 W_4^3 + x_4 W_4^0 + x_5 W_4^1 + x_6 W_4^2 + x_7 W_4^3 \\
    x_0 W_4^0 + x_1 W_4^2 + x_2 W_4^4 + x_3 W_4^6 + x_4 W_4^0 + x_5 W_4^2 + x_6 W_4^4 + x_7 W_4^6 \\
    x_0 W_4^0 + x_1 W_4^3 + x_2 W_4^6 + x_3 W_4^9 + x_4 W_4^0 + x_5 W_4^3 + x_6 W_4^6 + x_7 W_4^9 \\
  \end{array} \right) & \\
  &= \left( \begin{array}[cccccccc]
    WW_4^0 & W_4^0 & W_4^0 & W_4^0 & W_4^0 & W_4^0 & W_4^0 & W_4^0 \\
    W_4^0 & W_4^1 & W_4^2 & W_4^3 & W_4^0 & W_4^1 & W_4^2 & W_4^3 \\
    W_4^0 & W_4^2 & W_4^4 & W_4^6 & W_4^0 & W_4^2 & W_4^4 & W_4^6 \\
    W_4^0 & W_4^3 & W_4^6 & W_4^9 & W_4^0 & W_4^3 & W_4^6 & W_4^9 \\
  \end{array} \right)
  \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & \\
  &= \left( \begin{array}[cccccccc]
    11 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
    1 & -j & -1 & j & 1 & -j & -1　& j \\
    1 & -1 & 1 & -1 & 1 & -1 & 1 & -1 \\
    1 & j & -1 & -j & 1 & j & -1 & j \\
  \end{array} \right)
  \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & \\

  X^{(1)\top} &= \left( \begin{array}[c]
    XX_1 \\
    X_3 \\
    X_5 \\
    X_7
  \end{array} \right) = \left( \begin{array}[c]
    d\displaystyle\sum_{p=0}^{3}{(x_p - x_{p+4})W_8^{p} W_4^{0}} \\
    \displaystyle\sum_{p=0}^{3}{(x_p - x_{p+4})W_8^{p} W_4^{p}} \\
    \displaystyle\sum_{p=0}^{3}{(x_p - x_{p+4})W_8^{p} W_4^{2p}} \\
    \displaystyle\sum_{p=0}^{3}{(x_p - x_{p+4})W_8^{p} W_4^{3p}}
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    ((x_0 - x_4)W_8^{0} W_4^0 + (x_1 - x_5)W_8^1 W_4^0 + (x_2 - x_6)W_8^2 W_4^0 + (x_3 - x_7)W_8^3 W_4^0 \\
    (x_0 - x_4)W_8^0 W_4^0 + (x_1 - x_5)W_8^1 W_4^1 + (x_2 - x_6)W_8^2 W_4^2 + (x_3 - x_7)W_8^3 W_4^3 \\
    (x_0 - x_4)W_8^0 W_4^0 + (x_1 - x_5)W_8^1 W_4^2 + (x_2 - x_6)W_8^2 W_4^4 + (x_3 - x_7)W_8^3 W_4^6 \\
    (x_0 - x_4)W_8^0 W_4^0 + (x_1 - x_5)W_8^1 W_4^3 + (x_2 - x_6)W_8^2 W_4^6 + (x_3 - x_7)W_8^3 W_4^9 \\
  \end{array} \right) & \\
　&= \left( \begin{array}[c]
    xx_0 W_8^0 W_4^0 + x_1 W_8^1 W_4^0 + x_2 W_8^2 W_4^0 + x_3 W_8^3 W_4^0 - x_4 W_8^0 W_4^0 - x_5 W_8^1 W_4^0 - x_6 W_8^2 W_4^0 - x_7 W_8^3 W_4^0 \\
    x_0 W_8^0 W_4^0 + x_1 W_8^1 W_4^2 + x_2 W_8^2 W_4^4 + x_3 W_8^3 W_4^6 - x_4 W_8^0 W_4^0 - x_5 W_8^1 W_4^2 - x_6 W_8^2 W_4^4 - x_7 W_8^3 W_4^6 \\
    x_0 W_8^0 W_4^0 + x_1 W_8^1 W_4^3 + x_2 W_8^2 W_4^6 + x_3 W_8^3 W_4^9 - x_4 W_8^0 W_4^0 - x_5 W_8^1 W_4^3 - x_6 W_8^2 W_4^6 - x_7 W_8^3 W_4^9 \\
    x_0 W_8^0 W_4^0 + x_1 W_8^1 W_4^4 + x_2 W_8^2 W_4^8 + x_3 W_8^3 W_4^{12} - x_4 W_8^0 W_4^0 - x_5 W_8^1 W_4^4 - x_6 W_8^2 W_4^8 - x_7 W_8^3 W_4^{12}
  \end{array} \right) & \\
  &= \left( \begin{array}[cccccccc]
    WW_8^0 & W_8^1 & W_8^2 & W_8^3 & -W_8^0 & -W_8^1 & -W_8^2 & -W_8^3 \\
    W_8^0 & W_8^5 & W_8^{10} & W_8^{15} & -W_8^0 & -W_8^5 & -W_8^{10} & -W_8^{15} \\
    W_8^0 & W_8^7 & W_8^{14} & W_8^{21} & -W_8^0 & -W_8^7 & -W_8^{14} & -W_8^{21} \\
    W_8^0 & W_8^9 & W_8^{18} & W_8^{27} & -W_8^0 & -W_8^9 & -W_8^{18} & -W_8^{27} \\
  \end{array} \right)
  \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & (\because W_8^c W_4^d = W_8^{c+2d}) \\
  &= \left( \begin{array}[cccccccc]
    11 & e^{-j\frac{\pi}{4}} & -j & e^{-j\frac{3\pi}{4}} & -1 & e^{-j\frac{5\pi}{4}} & j & e^{-j\frac{7\pi}{4}} \\
    1 & e^{-j\frac{3\pi}{4}} & j & e^{-j\frac{\pi}{4}} & -1 & e^{-j\frac{7\pi}{4}} & -j & e^{-j\frac{5\pi}{4}} \\
    1 & e^{-j\frac{5\pi}{4}} & -j & e^{-j\frac{7\pi}{4}} & -1 & e^{-j\frac{\pi}{4}} & j & e^{-j\frac{3\pi}{4}} \\
    1 & e^{-j\frac{7\pi}{4}} & j & e^{-j\frac{5\pi}{4}} & -1 & e^{-j\frac{3\pi}{4}} & -j & e^{-j\frac{\pi}{4}} \\
  \end{array} \right)
  \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & (\because -1 = W_8^4)
\end{align}

You can calculate the values of even and odd columns exactly! Let's try the Cooley-Turekey algorithm further. First of all,

as a preliminary preparation

\begin{align}
  \boldsymbol{x}^{(0)} &= (x_0 + x_4, x_1 + x_5, x_2 + x_6, x_3 + x_7) \\ 
  \boldsymbol{x}^{(1)} &= \left((x_0 - x_4)W_8^0, (x_1 - x_5)W_8^1, (x_2 - x_6)W_8^2, (x_3 - x_7)W_8^3 \right)
\end{align}

Let's say </ div>. In this case

\left\{ \begin{array}[c]
  b\boldsymbol{X}^{(0)} = \mathcal{F}_{\boldsymbol{x}^{(0)}}(k) \\
  \boldsymbol{X}^{(1)} = \mathcal{F}_{\boldsymbol{x}^{(1)}}(k)
\end{array} \right.
\quad k = 0, 1, \ldots, 3

You can do that. If FFT is performed for each of these,

\begin{align}
  \boldsymbol{X}_{2k}^{(0)} &= \mathcal{F}_{\boldsymbol{x}^{(0)}}(2k) \\
  &= \sum_{p=0}^{1}{\left(x_p^{(0)} + x_{p+2}^{(0)} \right) W_{2}^{kp}} \\
  \boldsymbol{X}_{2k+1}^{(0)} &= \mathcal{F}_{\boldsymbol{x}^{(0)}}(2k+1) \\
  &= \sum_{p=0}^{1}{\left(x_p^{(0)} - x_{p+2}^{(0)} \right)W_4^p W_{2}^{kp}}
　　\boldsymbol{X}_{2k}^{(1)} &= \mathcal{F}_{\boldsymbol{x}^{(1)}}(2k) \\
  &= \sum_{p=0}^{1}{\left(x_p^{(1)} + x_{p+2}^{(1)} \right) W_{2}^{kp}} \\
  \boldsymbol{X}_{2k+1}^{(1)} &= \mathcal{F}_{\boldsymbol{x}^{(1)}}(2k+1) \\
  &= \sum_{p=0}^{1}{\left(x_p^{(1)} - x_{p+2}^{(1)}\right)W_4^p W_{2}^{kp}}
\end{align}

It will be. I will actually write it down and check if it is correct.

\begin{align}
  \boldsymbol{X}_{2k}^{(0)} &= \left( \begin{array}[c]
    XX_{0}^{(0)} \\
    X_{2}^{(0)}
  \end{array} \right) = \left( \begin{array}[c]
    XX_0 \\
    X_4
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    d\displaystyle\sum_{p=0}^{1}{\left(x_p^{(0)} + x_{p+2}^{(0)} \right) W_2^0} \\
    \displaystyle\sum_{p=0}^{1}{\left(x_p^{(0)} + x_{p+2}^{(0)} \right) W_2^p}
  \end{array} \right) = \left( \begin{array}[c]
    (\left(x_0^{(0)} + x_2^{(0)} \right) W_2^0 + \left(x_1^{(0)} + x_3^{(0)} \right) W_2^0 \\
    \left(x_0^{(0)} + x_2^{(0)} \right) W_2^0 + \left(x_1^{(0)} + x_3^{(0)} \right) W_2^1
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    (\{(x_0 + x_4) + (x_2 + x_6)\}W_2^0 + \{(x_1 + x_5) + (x_3 + x_7) \} W_2^0 \\
    \{(x_0 + x_4) + (x_2 + x_6)\}W_2^0 + \{(x_1 + x_5) + (x_3 + x_7) \} W_2^1
  \end{array} \right) & \\
  &= \left( \begin{array}[cccccccc]
    WW_2^0 & W_2^0 & W_2^0 & W_2^0 & W_2^0 & W_2^0 & W_2^0 & W_2^0 \\
    W_2^0 & W_2^1 & W_2^0 & W_2^1 & W_2^0 & W_2^1 & W_2^0 & W_2^1
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) && \\
  &= \left( \begin{array}[cccccccc]
    11 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
    1 & -1 & 1 & -1 & 1 & -1 & 1 & -1
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & \\

  \boldsymbol{X}_{2k+1}^{(0)} &= \left( \begin{array}[c]
    XX_{1}^{(0)} \\
    X_{3}^{(0)}
  \end{array} \right) = \left( \begin{array}[c]
    XX_2 \\
    X_6
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    d\displaystyle\sum_{p=0}^{1}{\left(x_p^{(0)} - x_{p+2}^{(0)} \right)W_4^p W_2^0} \\
    \displaystyle\sum_{p=0}^{1}{\left(x_p^{(0)} - x_{p+2}^{(0)} \right)W_4^p W_2^p}
  \end{array} \right) = \left( \begin{array}[c]
    (\left(x_0^{(0)} - x_2^{(0)} \right)W_4^0 W_2^0 + \left(x_1^{(0)} - x_3^{(0)} \right)W_4^1 W_2^0 \\
    \left(x_0^{(0)} - x_2^{(0)} \right)W_4^0 W_2^0 + \left(x_1^{(0)} - x_3^{(0)} \right)W_4^1 W_2^1
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    (\{(x_0 + x_4) - (x_2 + x_6)\}W_4^0 W_2^0 + \{(x_1 + x_5) - (x_3 + x_7)\}W_4^1 W_2^0 \\
    \{(x_0 + x_4) - (x_2 + x_6)\}W_4^0 W_2^0 + \{(x_1 + x_5) - (x_3 + x_7)\}W_4^1 W_2^1
  \end{array} \right) & \\
  &= \left( \begin{array}[cccccccc]
    WW_4^0 & W_4^1 & -W_4^0 & -W_4^1 & W_4^0 & W_4^1 & -W_4^0 & -W_4^1 \\
    W_4^0 & W_4^5 & -W_4^0 & -W_4^5 & W_4^0 & W_4^5 & -W_4^0 & -W_4^5
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & \\
  &= \left( \begin{array}[cccccccc]
    11 & -j & -1 & j & 1 & -j & -1 & j \\
    1 & j & -1 & -j & 1 & j & -1 & -j
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & (\because W_4^c W_2^d = W_4^{c+2d}, -1 = W_4^2) \\

  \boldsymbol{X}_{2k}^{(1)} &= \left( \begin{array}[c]
    XX_{0}^{(1)} \\
    X_{2}^{(1)}
  \end{array} \right) = \left( \begin{array}[c]
    XX_1 \\
    X_5
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    d\displaystyle\sum_{p=0}^{1}{\left(x_p^{(1)} + x_{p+2}^{(1)} \right) W_2^0} \\
    \displaystyle\sum_{p=0}^{1}{\left(x_p^{(1)} + x_{p+2}^{(1)} \right) W_2^p}
  \end{array} \right) = \left( \begin{array}[c]
    (\left(x_0^{(1)} + x_2^{(1)} \right) W_2^0 + \left(x_1^{(1)} + x_3^{(1)} \right) W_2^0 \\
    \left(x_0^{(1)} + x_2^{(1)} \right) W_2^0 + \left(x_1^{(1)} + x_3^{(1)} \right) W_2^1
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    (\left\{(x_0 - x_4)W_8^0 + (x_2 - x_6)W_8^2 \right\}W_2^0 + \left\{(x_1 - x_5)W_8^1 + (x_3 - x_7)W_8^3 \right\}W_2^0 \\
    \left\{(x_0 - x_4)W_8^0 + (x_2 - x_6)W_8^2 \right\}W_2^0 + \left\{(x_1 - x_5)W_8^1 + (x_3 - x_7)W_8^3 \right\}W_2^1
  \end{array} \right) & \\
  &= \left( \begin{array}[cccccccc]
    WW_8^0 & W_8^1 & W_8^2 & W_8^3 & -W_8^0 & -W_8^1 & -W_8^2 & -W_8^3 \\
    W_8^0 & W_8^5 & W_8^2 & W_8^7 & -W_8^0 & -W_8^5 & -W_8^2 & -W_8^7
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & \\
  &= \left( \begin{array}[cccccccc]
    11 & e^{-j\frac{\pi}{4}} & -j & e^{-j\frac{3\pi}{4}} & -1 & e^{-j\frac{5\pi}{4}} & j & e^{-j\frac{7\pi}{4}} \\
    1 & e^{-j\frac{5\pi}{4}} & -j & e^{-j\frac{7\pi}{4}} & -1 & e^{-j\frac{\pi}{4}} & j & e^{-j\frac{3\pi}{4}}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & \\

  \boldsymbol{X}_{2k+1}^{(1)} &= \left( \begin{array}[c]
    XX_{1}^{(1)} \\
    X_{3}^{(1)}
  \end{array} \right) = \left( \begin{array}[c]
    XX_3 \\
    X_7
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    d\displaystyle\sum_{p=0}^{1}{\left(x_p^{(1)} - x_{p+2}^{(1)} \right)W_4^p W_2^0} \\
    \displaystyle\sum_{p=0}^{1}{\left(x_p^{(1)} - x_{p+2}^{(1)} \right)W_4^p W_2^p}
  \end{array} \right) = \left( \begin{array}[c]
    (\left(x_0^{(1)} - x_2^{(1)} \right)W_4^0 W_2^0 + \left(x_1^{(1)} - x_3^{(1)} \right)W_4^1 W_2^0 \\
    \left(x_0^{(1)} - x_2^{(1)} \right)W_4^0 W_2^0 + \left(x_1^{(1)} - x_3^{(1)} \right)W_4^1 W_2^1
  \end{array} \right) & \\
  &= \left( \begin{array}[c]
    (\left\{(x_0 - x_4)W_8^0 - (x_2 - x_6)W_8^2 \right\}W_4^0 W_2^0 + \left\{(x_1 - x_5)W_8^1 - (x_3 - x_7)W_8^3 \right\}W_4^1 W_2^0 \\
    \left\{(x_0 - x_4)W_8^0 - (x_2 - x_6)W_8^2 \right\}W_4^0 W_2^0 + \left\{(x_1 - x_5)W_8^1 - (x_3 - x_7)W_8^3 \right\}W_4^1 W_2^1
  \end{array} \right) \\
  &= \left( \begin{array}[cccccccc]
    WW_8^0 & W_8^3 & -W_8^2 & -W_8^5 & -W_8^0 & -W_8^3 & W_8^2 & W_8^5 \\
    W_8^0 & W_8^7 & -W_8^2 & -W_8^9 & -W_8^0 & -W_8^7 & W_8^2 & W_8^9
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & \\
  &= \left( \begin{array}[cccccccc]
    11 & e^{-j\frac{3\pi}{4}} & j & e^{-j\frac{\pi}{4}} & -1 & e^{-j\frac{7\pi}{4}} & -j & e^{-j\frac{5\pi}{4}} \\
    1 & e^{-j\frac{7\pi}{4}} & j & e^{-j\frac{5\pi}{4}} & -1 & e^{-j\frac{3\pi}{4}} & -j & e^{-j\frac{\pi}{4}}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) & \left(\because W_4^c W_2^d = W_4^{c+2d}, -1 = W_4^2 \right)
\end{align}

You can see that the calculation is correct (distant eyes). Let's apply the FFT again.

\begin{align}
  \boldsymbol{x}^{(00)} &= \left(x_0^{(0)} + x_2^{(0)}, x_1^{(0)} + x_3^{(0)} \right) = \big((x_0 + x_4) + (x_2 + x_6), (x_1 + x_5) + (x_3 + x_7) \big) \\
  \boldsymbol{x}^{(01)} &= \left( \left(x_0^{(0)} - x_2^{(0)} \right)W_4^0, \left(x_1^{(0)} - x_3^{(0)} \right)W_4^1 \right) = \left(\{(x_0 + x_4) - (x_2 + x_6)\}W_4^0, \{(x_1 + x_5) - (x_3 + x_7)\}W_4^1 \right) \\
  \boldsymbol{x}^{(10)} &= \left(x_0^{(1)} + x_2^{(1)}, x_1^{(1)} + x_3^{(1)} \right) = \left((x_0 - x_4)W_8^0 + (x_2 - x_6)W_8^2, (x_1 - x_5)W_8^1 + (x_3 - x_7)W_8^3 \right) \\
  \boldsymbol{x}^{(11)} &= \left( \left(x_0^{(0)} - x_2^{(0)} \right)W_4^0, \left(x_1^{(0)} - x_3^{(0)} \right)W_4^1 \right) = \left( \left\{(x_0 - x_4)W_8^0 - (x_2 - x_6)W_8^2 \right\}W_4^0, \left\{(x_1 - x_5)W_8^1 - (x_3 - x_7)W_8^3 \right\}W_4^1 \right)
\end{align}

Then

\left\{ \begin{array}[c]
  b\boldsymbol{X}^{(00)} = \mathcal{F}_{\boldsymbol{x}^{(00)}}(k) \\
  \boldsymbol{X}^{(01)} = \mathcal{F}_{\boldsymbol{x}^{(01)}}(k) \\
  \boldsymbol{X}^{(10)} = \mathcal{F}_{\boldsymbol{x}^{(10)}}(k) \\
  \boldsymbol{X}^{(11)} = \mathcal{F}_{\boldsymbol{x}^{(11)}}(k)
\end{array} \right.
\quad k = 0, 1

\begin{align}
  \boldsymbol{X}_{2k}^{(00)} &= \mathcal{F}_{\boldsymbol{x}^{(00)}}(2k) \\
  &= \sum_{p=0}^{0}{\left(x_p^{(00)} + x_{p+1}^{(00)}\right) W_{1}^{kp}} \\
  \boldsymbol{X}_{2k+1}^{(00)} &= \mathcal{F}_{\boldsymbol{x}^{(00)}}(2k+1) \\
  &= \sum_{p=0}^{0}{\left(x_p^{(00)} - x_{p+1}^{(00)}\right)W_2^{p} W_{1}^{kp}} \\
  \boldsymbol{X}_{2k}^{(01)} &= \mathcal{F}_{\boldsymbol{x}^{(01)}}(2k) \\
  &= \sum_{p=0}^{0}{\left(x_p^{(01)} + x_{p+1}^{(01)}\right) W_{1}^{kp}} \\
  \boldsymbol{X}_{2k+1}^{(01)} &= \mathcal{F}_{\boldsymbol{x}^{(01)}}(2k+1) \\
  &= \sum_{p=0}^{0}{\left(x_p^{(01)} - x_{p+1}^{(01)}\right)W_2^{p} W_{1}^{kp}} \\
  \boldsymbol{X}_{2k}^{(10)} &= \mathcal{F}_{\boldsymbol{x}^{(10)}}(2k) \\
  &= \sum_{p=0}^{0}{\left(x_p^{(10)} + x_{p+1}^{(10)}\right) W_{1}^{kp}} \\
  \boldsymbol{X}_{2k+1}^{(10)} &= \mathcal{F}_{\boldsymbol{x}^{(10)}}(2k+1) \\
  &= \sum_{p=0}^{0}{\left(x_p^{(10)} - x_{p+1}^{(10)}\right)W_2^{p} W_{1}^{kp}} \\
  \boldsymbol{X}_{2k}^{(11)} &= \mathcal{F}_{\boldsymbol{x}^{(11)}}(2k) \\
  &= \sum_{p=0}^{0}{\left(x_p^{(11)} + x_{p+1}^{(11)}\right) W_{1}^{kp}} \\
  \boldsymbol{X}_{2k+1}^{(11)} &= \mathcal{F}_{\boldsymbol{x}^{(11)}}(2k+1) \\
  &= \sum_{p=0}^{0}{\left(x_p^{(11)} - x_{p+1}^{(11)}\right)W_2^{p} W_{1}^{kp}} \\
\end{align}

I'll do it when I get here ...!

\begin{align}
  \boldsymbol{X}_{2k}^{(00)} &= X_0 \\
  &= \sum_{p=0}^{0}{\left(x_p^{(00)} + x_{p+1}^{(00)}\right) W_{1}^{0}} = \left(x_0^{(00)} + x_{1}^{(00)}\right) W_{1}^{0} \\
  &= \left(((x_0 + x_4) + (x_2 + x_6)) + ((x_1 + x_5) + (x_3 + x_7))\right) W_{1}^{0} \\
  &= \left( \begin{array}[cccccccc]
    WW_{1}^{0} & W_{1}^{0} & W_{1}^{0} & W_{1}^{0} & W_{1}^{0} & W_{1}^{0} & W_{1}^{0} & W_{1}^{0}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\
  &= \left( \begin{array}[cccccccc]
    11 & 1 & 1 & 1 & 1 & 1 & 1 & 1
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\

  \boldsymbol{X}_{2k+1}^{(00)} &= X_4 \\
  &= \sum_{p=0}^{0}{\left(x_p^{(00)} - x_{p+1}^{(00)}\right)W_2^{p} W_{1}^{kp}} = \left(x_0^{(00)} - x_{1}^{(00)}\right)W_2^0 W_{1}^{0} \\
  &= \left(((x_0 + x_4) + (x_2 + x_6)) - ((x_1 + x_5) + (x_3 + x_7))\right) W_2^0 W_{1}^{0} \\
  &= \left( \begin{array}[cccccccc]
    WW_{2}^{0} & -W_{2}^{0} & W_{2}^{0} & -W_{2}^{0} & W_{2}^{0} & -W_{2}^{0} & W_{2}^{0} & -W_{2}^{0}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\
  &= \left( \begin{array}[cccccccc]
    11 & -1 & 1 & -1 & 1 & -1 & 1 & -1
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\

  \boldsymbol{X}_{2k}^{(01)} &= X_2 \\
  &= \sum_{p=0}^{0}{\left(x_p^{(01)} + x_{p+1}^{(01)}\right) W_{1}^{kp}} = \left(x_0^{(01)} + x_{1}^{(01)}\right) W_{1}^{0} \\
  &= \left(\{(x_0 + x_4) - (x_2 + x_6)\}W_4^0 + \{(x_1 + x_5) - (x_3 + x_7)\}W_4^1 \right) W_{1}^{0} \\
  &= \left( \begin{array}[cccccccc]
    WW_{4}^{0} & W_{4}^{1} & -W_{4}^{0} & -W_{4}^{1} & W_{4}^{0} & W_{4}^{1} & -W_{4}^{0} & -W_{4}^{1}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\
  &= \left( \begin{array}[cccccccc]
    11 & -j & -1 & j & 1 & -j & -1 & j \\
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\

\boldsymbol{X}_{2k+1}^{(01)} &= X_6 \\
  &= \sum_{p=0}^{0}{\left(x_p^{(01)} - x_{p+1}^{(01)}\right)W_2^{p} W_{1}^{kp}} = \left(x_0^{(01)} - x_{1}^{(01)}\right)W_2^0 W_{1}^{0} \\
  &= \left(\{(x_0 + x_4) - (x_2 + x_6)\}W_4^0 - \{(x_1 + x_5) - (x_3 + x_7)\}W_4^1 \right) W_2^0 W_{1}^{0} \\
  &= \left( \begin{array}[cccccccc]
    WW_{4}^{0} & -W_{4}^{1} & -W_{4}^{0} & W_{4}^{1} & W_{4}^{0} &  -W_{4}^{1} & -W_{4}^{0} & W_{4}^{1}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\
  &= \left( \begin{array}[cccccccc]
    11 & j & -1 & -j & 1 & j & -1 & -j
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\

  \boldsymbol{X}_{2k}^{(10)} &= X_1 \\
  &= \sum_{p=0}^{0}{\left(x_p^{(10)} + x_{p+1}^{(10)}\right) W_{1}^{0}} = \left(x_0^{(10)} + x_{1}^{(10)}\right) W_{1}^{0} \\
  &= \left(((x_0 - x_4)W_8^0 + (x_2 - x_6)W_8^2) + ((x_1 - x_5)W_8^1 + (x_3 - x_7)W_8^3) \right) W_{1}^{0} \\
  &= \left( \begin{array}[cccccccc]
    WW_{8}^{0} & W_{8}^{1} & W_{8}^{2} & W_{8}^{3} & -W_{8}^{0} & -W_{8}^{1} & -W_{8}^{2} & -W_{8}^{3}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\
  &= \left( \begin{array}[cccccccc]
    11 & e^{-j\frac{\pi}{4}} & -j & e^{-j\frac{3\pi}{4}} & -1 & e^{-j\frac{5\pi}{4}} & j & e^{-j\frac{7\pi}{4}}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\

  \boldsymbol{X}_{2k+1}^{(10)} &= X_5 \\
  &= \sum_{p=0}^{0}{\left(x_p^{(10)} - x_{p+1}^{(10)}\right)W_2^{p} W_{1}^{kp}} = \left(x_0^{(10)} - x_{1}^{(10)}\right)W_2^0 W_{1}^{0} \\
  &= \left(\{(x_0 - x_4)W_8^0 + (x_2 - x_6)W_8^2\} - \{(x_1 - x_5)W_8^1 + (x_3 - x_7)W_8^3\}\right) W_2^0 W_{1}^{0} \\
  &= \left( \begin{array}[cccccccc]
    WW_{8}^{0} & -W_{8}^{1} & W_{8}^{2} & -W_{8}^{3} & -W_{8}^{0} & W_{8}^{1} & -W_{8}^{2} & W_{8}^{3}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\
  &= \left( \begin{array}[cccccccc]
    11 & e^{-j\frac{5\pi}{4}} & -j & e^{-j\frac{7\pi}{4}} & -1 & e^{-j\frac{\pi}{4}} & j & e^{-j\frac{3\pi}{4}}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\

  \boldsymbol{X}_{2k}^{(11)} &= X_3 \\
  &= \sum_{p=0}^{0}{\left(x_p^{(11)} + x_{p+1}^{(11)}\right) W_{1}^{0}} = \left(x_0^{(11)} + x_{1}^{(11)}\right) W_{1}^{0} \\
  &= \left(\{(x_0 - x_4)W_8^0 - (x_2 - x_6)W_8^2\}W_4^0 + \{(x_1 - x_5)W_8^1 - (x_3 - x_7)W_8^3\}W_4^1 \right) W_{1}^{0} \\
  &= \left( \begin{array}[cccccccc]
    WW_{8}^{0} & W_{8}^{3} & -W_{8}^{2} & -W_{8}^{5} & -W_{8}^{0} & -W_{8}^{3} & W_{8}^{2} & W_{8}^{5}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\
  &= \left( \begin{array}[cccccccc]
    11 & e^{-j\frac{3\pi}{4}} & j & e^{-j\frac{\pi}{4}} & -1 & e^{-j\frac{7\pi}{4}} & -j & e^{-j\frac{5\pi}{4}}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\

  \boldsymbol{X}_{2k+1}^{(11)} &= X_7 \\
  &= \sum_{p=0}^{0}{\left(x_p^{(11)} - x_{p+1}^{(11)}\right)W_2^{p} W_{1}^{kp}} = \left(x_0^{(11)} - x_{1}^{(11)}\right)W_2^0 W_{1}^{0} \\
  &= \left(\{(x_0 - x_4)W_8^0 - (x_2 - x_6)W_8^2\}W_4^0 - \{(x_1 - x_5)W_8^1 - (x_3 - x_7)W_8^3\}W_4^1\right) W_2^0 W_{1}^{0} \\
  &= \left( \begin{array}[cccccccc]
    WW_{8}^{0} & -W_{8}^{3} & -W_{8}^{2} & W_{8}^{5} & -W_{8}^{0} & W_{8}^{3} & W_{8}^{2} -& W_{8}^{5}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right) \\
  &= \left( \begin{array}[cccccccc]
    11 & e^{-j\frac{7\pi}{4}} & j & e^{-j\frac{5\pi}{4}} & -1 & e^{-j\frac{3\pi}{4}} & -j & e^{-j\frac{\pi}{4}}
  \end{array} \right) \left( \begin{array}[c]
    xx_0 \\
    x_1 \\
    x_2 \\
    x_3 \\
    x_4 \\
    x_5 \\
    x_6 \\
    x_7
  \end{array} \right)
\end{align}

It's like that! It is a good match. In the confirmation of the calculation by writing so far, $ \ boldsymbol {x} ^ {(0)} $ etc. are expanded and calculated, but please note that such a thing is not done in the actual implementation. Please give me. Also, as some of you may have noticed, bit inversion sorting has occurred as shown below.

X^{(000)} = X_{2k}^{(00)} = X_0 = X_{(000)_{(2)}} \quad
X^{(001)} = X_{2k+1}^{(00)} = X_4 = X_{(100)_{(2)}} \\
X^{(010)} = X_{2k}^{(01)} = X_2 = X_{(010)_{(2)}} \quad
X^{(011)} = X_{2k+1}^{(01)} = X_6 = X_{(110)_{(2)}} \\
X^{(100)} = X_{2k}^{(10)} = X_1 = X_{(001)_{(2)}} \quad
X^{(101)} = X_{2k+1}^{(10)} = X_5 = X_{(101)_{(2)}} \\
X^{(110)} = X_{2k}^{(11)} = X_3 = X_{(011)_{(2)}} \quad
X^{(111)} = X_{2k+1}^{(11)} = X_7 = X_{(111)_{(2)}}

Considering how to disassemble, it is natural that the order has been changed. This is inefficient, so you can do it well at the implementation stage so that you can calculate in the correct order. For details, please see the [Reference Site](http://wwwa.pikara.ne.jp/okojisan/stockham/cooley-tukey.html).

This reduces the amount of computation to $ \ mathcal {O} (N \ log_2 N) $. The logarithmic order is very good in the world of computational complexity, and it guarantees that the calculation can be completed in a realistic time unless the value of $ N $ is too exorbitant. I will.

Well, that's how fast Fourier transform is possible. However, although it is a very effective calculation method in which the order of time complexity is $ \ mathcal {O} (N \ log_2 N) $, it is not without its drawbacks, and from the assumption, it becomes inefficient in terms of spatial complexity. I will. Anyway, it needs to be aligned to the power of 2. Normally, zero padding is used, but the larger the size of the conversion target, the easier it is to move away from the power of 2, and extra unnecessary zero padding becomes necessary. Therefore, when using it, consider the memory margin and take into account the time / space complexity trade-off.

reference

Fast Training of Convolutional Networks through FFTs; Michael Mathieu, Mikael Henaff, Yann LeCun; arXiv:1312.5851v5 [cs.CV] 6 Mar 2014 -The processing of the convolutional layer of deep learning was not "convolution" -Autocorrelation and cross-correlation -Introduction to Fast Fourier Transform --Cooley-Tukey Algorithm

in conclusion

The appendix took 10 times longer than the main story ... who would be useful ... In the future, I would like to collect information and implement it in the future while compiling FCNN-related papers like this. If you have any information, please let me know!