To build a CNN in Theano, I did some research on Theano's two-dimensional convolution function `theano.tensor.nnet.conv ()`. We compared it with the N-dimensional convolution function `scipy.signal.fftconvolve ()`, which is probably commonly used in signal processing.

Convolution between 2D arrays

First, let's convolve between simple two-dimensional arrays.

import theano
import theano.tensor as T
import theano.tensor.signal as signal 
import scipy.signal as s

m = T.matrix()
w = T.matrix()

#Must be rank 4.
o_full = nnet.conv.conv2d(m[None,None,:,:], w[None, None,:,:],
                          border_mode='full')
o_valid = nnet.conv.conv2d(m[None,None,:,:], w[None, None,:,:],
                          border_mode='valid')

m_arr = arange(25.).reshape((5,5)).astype(float32)
w_arr = ones((3,3)).astype(float32)
print("m_arr =")
print(m_arr)
print("w_arr =")
print(w_arr)

print("Output for Theano.")
print("full:")
print(o_full.eval({m:m_arr, w:w_arr}).round().astype(int))
print("valid:")
print(o_valid.eval({m:m_arr, w:w_arr}).round().astype(int))

print("Output for scipy.")
print("full:")
print(s.fftconvolve(m_arr, w_arr, "full").round().astype(int))
print("valid:")
print(s.fftconvolve(m_arr, w_arr, "valid").round().astype(int))

Folded arraym_arrWindow function that convolves with(or kernel or filter)w_arrTotheano.tensor.nnet.conv.conv2d()Whenscipy.signal.fftconvolve()``` It is flowing to each. here,

# Must be rank 4.
o_full = nnet.conv.conv2d(m[None,None,:,:], w[None, None,:,:],
                          border_mode='full')
o_valid = nnet.conv.conv2d(m[None,None,:,:], w[None, None,:,:],
                          border_mode='valid')

like,m[None,None,:,:], w[None, None,:,:]The format of the input and kernel array is[Number of images, number of channels, height, width]Because it is.m,wIs rank 2T.matrix()Because I defined it as[None, None,:,:]By doing like, we have increased the top rank by two.This broadcast is the same as that of NumpySo it's very easy to use personally.

The output looks like this:

m_arr =
[[  0.   1.   2.   3.   4.]
 [  5.   6.   7.   8.   9.]
 [ 10.  11.  12.  13.  14.]
 [ 15.  16.  17.  18.  19.]
 [ 20.  21.  22.  23.  24.]]
w_arr =
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
Output for Theano.
full:
[[[[  0   1   3   6   9   7   4]
   [  5  12  21  27  33  24  13]
   [ 15  33  54  63  72  51  27]
   [ 30  63  99 108 117  81  42]
   [ 45  93 144 153 162 111  57]
   [ 35  72 111 117 123  84  43]
   [ 20  41  63  66  69  47  24]]]]
valid:
[[[[ 54  63  72]
   [ 99 108 117]
   [144 153 162]]]]
Output for scipy.
full:
[[  0   1   3   6   9   7   4]
 [  5  12  21  27  33  24  13]
 [ 15  33  54  63  72  51  27]
 [ 30  63  99 108 117  81  42]
 [ 45  93 144 153 162 111  57]
 [ 35  72 111 117 123  84  43]
 [ 20  41  63  66  69  47  24]]
valid:
[[ 54  63  72]
 [ 99 108 117]
 [144 153 162]]

The output of the convolution is rounded off for easy viewing and then converted to an int.

theano.tensor.nnet.conv()Then border_There was an argument called mode. You can select full or valid for this. Convolution takes the sum by multiplying the image while moving the filter, but full is a mode that includes the result of the state where at least one of the elements overlaps the image even if the filter protrudes from the image, valid is This mode outputs only the result when the filter does not extend beyond the image. The image of a certain axis and the size of the filter are different$M,m$At the time, the size of the axis with the output array is full$M+(m-1)$, Valid$M-(m-1)$Will be. Height in the above example(or width)But$M=5,m=3$Therefore, it is 7 when it is full and 3 when it is valid.



When you check the output,```theano.tensor.nnet.conv()```When```scipy.signal.fftconvolve()```so(Except for the rank of the array)等しいこWhenが確認soきます。

However, the two outputs have different meanings.```scipy.signal.fftconvolve()```Returns the result of a pure N-dimensional convolution, whereas```theano.tensor.nnet.conv()```Returns the result of convolution for each number of images and each filter. The output array is```[Number of images, number of files, height, width]```is. Also, as will be described later```theano.tensor.nnet.conv()```Must have the same number of channels for the image and the filter.

#Convolution when the dimensions of the number of images and the number of channels are added

Next, we will perform a convolution that adds the dimensions of the number of images and the number of channels. Convolve a 3x3 filter with 1 image and 3 channels for a 5x5 image with 2 images and 3 channels. The program looks like this:

```python
m = T.tensor4()
w = T.tensor4()

# Must be rank 4.
o_full = nnet.conv.conv2d(m, w,
                          border_mode='full')
o_valid = nnet.conv.conv2d(m, w,
                          border_mode='valid')

m_arr = arange(2*3*5*5).reshape((2, 3, 5, 5)).astype(float32)
w_arr = ones((1,3,3,3)).astype(float32)
print("m_arr =")
print(m_arr)
print("w_arr =")
print(w_arr)

print("Output for Theano.")
print("full:")
print(o_full.eval({m:m_arr, w:w_arr}).round().astype(int))
print("valid:")
print(o_valid.eval({m:m_arr, w:w_arr}).round().astype(int))

print("Output for scipy.")
print("full:")
print(s.fftconvolve(m_arr, w_arr, "full").round().astype(int))
print("valid:")
print(s.fftconvolve(m_arr, w_arr, "valid").round().astype(int))

To set a rank 4 tensormWhenwToT.tensor4()Is set.

m_arr =
[[[[   0.    1.    2.    3.    4.]
   [   5.    6.    7.    8.    9.]
   [  10.   11.   12.   13.   14.]
   [  15.   16.   17.   18.   19.]
   [  20.   21.   22.   23.   24.]]

  [[  25.   26.   27.   28.   29.]
   [  30.   31.   32.   33.   34.]
   [  35.   36.   37.   38.   39.]
   [  40.   41.   42.   43.   44.]
   [  45.   46.   47.   48.   49.]]

  [[  50.   51.   52.   53.   54.]
   [  55.   56.   57.   58.   59.]
   [  60.   61.   62.   63.   64.]
   [  65.   66.   67.   68.   69.]
   [  70.   71.   72.   73.   74.]]]


 [[[  75.   76.   77.   78.   79.]
   [  80.   81.   82.   83.   84.]
   [  85.   86.   87.   88.   89.]
   [  90.   91.   92.   93.   94.]
   [  95.   96.   97.   98.   99.]]

  [[ 100.  101.  102.  103.  104.]
   [ 105.  106.  107.  108.  109.]
   [ 110.  111.  112.  113.  114.]
   [ 115.  116.  117.  118.  119.]
   [ 120.  121.  122.  123.  124.]]

  [[ 125.  126.  127.  128.  129.]
   [ 130.  131.  132.  133.  134.]
   [ 135.  136.  137.  138.  139.]
   [ 140.  141.  142.  143.  144.]
   [ 145.  146.  147.  148.  149.]]]]
w_arr =
[[[[ 1.  1.  1.]
   [ 1.  1.  1.]
   [ 1.  1.  1.]]

  [[ 1.  1.  1.]
   [ 1.  1.  1.]
   [ 1.  1.  1.]]

  [[ 1.  1.  1.]
   [ 1.  1.  1.]
   [ 1.  1.  1.]]]]
Output for Theano.
full:
[[[[  75  153  234  243  252  171   87]
   [ 165  336  513  531  549  372  189]
   [ 270  549  837  864  891  603  306]
   [ 315  639  972  999 1026  693  351]
   [ 360  729 1107 1134 1161  783  396]
   [ 255  516  783  801  819  552  279]
   [ 135  273  414  423  432  291  147]]]


 [[[ 300  603  909  918  927  621  312]
   [ 615 1236 1863 1881 1899 1272  639]
   [ 945 1899 2862 2889 2916 1953  981]
   [ 990 1989 2997 3024 3051 2043 1026]
   [1035 2079 3132 3159 3186 2133 1071]
   [ 705 1416 2133 2151 2169 1452  729]
   [ 360  723 1089 1098 1107  741  372]]]]
valid:
[[[[ 837  864  891]
   [ 972  999 1026]
   [1107 1134 1161]]]


 [[[2862 2889 2916]
   [2997 3024 3051]
   [3132 3159 3186]]]]
Output for scipy.
full:
[[[[   0    1    3    6    9    7    4]
   [   5   12   21   27   33   24   13]
   [  15   33   54   63   72   51   27]
   [  30   63   99  108  117   81   42]
   [  45   93  144  153  162  111   57]
   [  35   72  111  117  123   84   43]
   [  20   41   63   66   69   47   24]]

  [[  25   52   81   87   93   64   33]
   [  60  124  192  204  216  148   76]
   [ 105  216  333  351  369  252  129]
   [ 135  276  423  441  459  312  159]
   [ 165  336  513  531  549  372  189]
   [ 120  244  372  384  396  268  136]
   [  65  132  201  207  213  144   73]]

  [[  75  153  234  243  252  171   87]
   [ 165  336  513  531  549  372  189]
   [ 270  549  837  864  891  603  306]
   [ 315  639  972  999 1026  693  351]
   [ 360  729 1107 1134 1161  783  396]
   [ 255  516  783  801  819  552  279]
   [ 135  273  414  423  432  291  147]]

  [[  75  152  231  237  243  164   83]
   [ 160  324  492  504  516  348  176]
   [ 255  516  783  801  819  552  279]
   [ 285  576  873  891  909  612  309]
   [ 315  636  963  981  999  672  339]
   [ 220  444  672  684  696  468  236]
   [ 115  232  351  357  363  244  123]]

  [[  50  101  153  156  159  107   54]
   [ 105  212  321  327  333  224  113]
   [ 165  333  504  513  522  351  177]
   [ 180  363  549  558  567  381  192]
   [ 195  393  594  603  612  411  207]
   [ 135  272  411  417  423  284  143]
   [  70  141  213  216  219  147   74]]]


 [[[  75  151  228  231  234  157   79]
   [ 155  312  471  477  483  324  163]
   [ 240  483  729  738  747  501  252]
   [ 255  513  774  783  792  531  267]
   [ 270  543  819  828  837  561  282]
   [ 185  372  561  567  573  384  193]
   [  95  191  288  291  294  197   99]]

  [[ 175  352  531  537  543  364  183]
   [ 360  724 1092 1104 1116  748  376]
   [ 555 1116 1683 1701 1719 1152  579]
   [ 585 1176 1773 1791 1809 1212  609]
   [ 615 1236 1863 1881 1899 1272  639]
   [ 420  844 1272 1284 1296  868  436]
   [ 215  432  651  657  663  444  223]]

  [[ 300  603  909  918  927  621  312]
   [ 615 1236 1863 1881 1899 1272  639]
   [ 945 1899 2862 2889 2916 1953  981]
   [ 990 1989 2997 3024 3051 2043 1026]
   [1035 2079 3132 3159 3186 2133 1071]
   [ 705 1416 2133 2151 2169 1452  729]
   [ 360  723 1089 1098 1107  741  372]]

  [[ 225  452  681  687  693  464  233]
   [ 460  924 1392 1404 1416  948  476]
   [ 705 1416 2133 2151 2169 1452  729]
   [ 735 1476 2223 2241 2259 1512  759]
   [ 765 1536 2313 2331 2349 1572  789]
   [ 520 1044 1572 1584 1596 1068  536]
   [ 265  532  801  807  813  544  273]]

  [[ 125  251  378  381  384  257  129]
   [ 255  512  771  777  783  524  263]
   [ 390  783 1179 1188 1197  801  402]
   [ 405  813 1224 1233 1242  831  417]
   [ 420  843 1269 1278 1287  861  432]
   [ 285  572  861  867  873  584  293]
   [ 145  291  438  441  444  297  149]]]]
valid:
[[[[ 837  864  891]
   [ 972  999 1026]
   [1107 1134 1161]]]


 [[[2862 2889 2916]
   [2997 3024 3051]
   [3132 3159 3186]]]]

It's long and difficult to compare,validIs the same, butfullThe results are different between the two. So, let's look at the shape of the array after output.

print("Output for Theano.")
print("full:")
print(o_full.eval({m:m_arr, w:w_arr}).round().astype(int).shape)
print("valid:")
print(o_valid.eval({m:m_arr, w:w_arr}).round().astype(int).shape)

print("Output for scipy.")
print("full:")
print(s.fftconvolve(m_arr, w_arr, "full").round().astype(int).shape)
print("valid:")
print(s.fftconvolve(m_arr, w_arr, "valid").round().astype(int).shape)

Output for Theano.
full:
(2, 1, 7, 7)
valid:
(2, 1, 3, 3)
Output for scipy.
full:
(2, 5, 7, 7)
valid:
(2, 1, 3, 3)

this is,scipy.signal.fftconvolve()Performs the convolution operation on the axes of the number of images and the number of channels.theano.tensor.nnet.conv()This is because the image width and height dimensions are used only, and the number of images and the number of channels are processed independently. Andtheano.tensor.nnet.conv()The output of[Number of images, number of filters, height, width]So the second shape is 1. Also,theano.tensor.nnet.conv()As mentioned above, it is necessary to match the number of channels with the image and the filter. For example

m_arr = arange(2*3*5*5).reshape((2, 3, 5, 5)).astype(float32)
w_arr = ones((1,1,3,3)).astype(float32)

When the number of channels of the image is 3 and the number of channels of the filter is 1, as intheano.tensor.nnet.conv()Will output the following error.

ValueError: GpuDnnConv images and kernel must have the same stack size

However,scipy.signal.fftconvolve()Then the shape of the array is

Output for scipy.
full:
(2, 3, 7, 7)
valid:
(2, 3, 3, 3)

Returns the result of.fullThenM+(m-1)、validThenM-(m-1)It is as follows.

Finally, try with 2 images, 3 filters, and 1 channel. We also reduced the number of elements in the array.

m = T.tensor4()
w = T.tensor4()

# Must be rank 4.
o_full = nnet.conv.conv2d(m, w,
                          border_mode='full')
o_valid = nnet.conv.conv2d(m, w,
                          border_mode='valid')

m_arr = arange(2*1*3*3).reshape((2, 1, 3, 3)).astype(float32)
w_arr = ones((3,1,1,1)).astype(float32)
print("m_arr =")
print(m_arr)
print("w_arr =")
print(w_arr)

print("Output for Theano.")
print("full:")
print(o_full.eval({m:m_arr, w:w_arr}).round().astype(int))
print("valid:")
print(o_valid.eval({m:m_arr, w:w_arr}).round().astype(int))

print("Output for scipy.")
print("full:")
print(s.fftconvolve(m_arr, w_arr, "full").round().astype(int))
print("valid:")
print(s.fftconvolve(m_arr, w_arr, "valid").round().astype(int))

m_arr =
[[[[  0.   1.   2.]
   [  3.   4.   5.]
   [  6.   7.   8.]]]


 [[[  9.  10.  11.]
   [ 12.  13.  14.]
   [ 15.  16.  17.]]]]
w_arr =
[[[[ 1.]]]


 [[[ 1.]]]


 [[[ 1.]]]]
Output for Theano.
full:
[[[[ 0  1  2]
   [ 3  4  5]
   [ 6  7  8]]

  [[ 0  1  2]
   [ 3  4  5]
   [ 6  7  8]]

  [[ 0  1  2]
   [ 3  4  5]
   [ 6  7  8]]]


 [[[ 9 10 11]
   [12 13 14]
   [15 16 17]]

  [[ 9 10 11]
   [12 13 14]
   [15 16 17]]

  [[ 9 10 11]
   [12 13 14]
   [15 16 17]]]]
valid:
[[[[ 0  1  2]
   [ 3  4  5]
   [ 6  7  8]]

  [[ 0  1  2]
   [ 3  4  5]
   [ 6  7  8]]

  [[ 0  1  2]
   [ 3  4  5]
   [ 6  7  8]]]


 [[[ 9 10 11]
   [12 13 14]
   [15 16 17]]

  [[ 9 10 11]
   [12 13 14]
   [15 16 17]]

  [[ 9 10 11]
   [12 13 14]
   [15 16 17]]]]
Output for scipy.
full:
[[[[ 0  1  2]
   [ 3  4  5]
   [ 6  7  8]]]


 [[[ 9 11 13]
   [15 17 19]
   [21 23 25]]]


 [[[ 9 11 13]
   [15 17 19]
   [21 23 25]]]


 [[[ 9 10 11]
   [12 13 14]
   [15 16 17]]]]
valid:
ValueError: For 'valid' mode, one must be at least as large as the other in every dimension

scipy.signal.fftconvolve()ofvalidHas resulted in an error.validof場合、画像とフィルタofいずれかが片方よりもすべてof次元で大きくないといけないようです。

The shape of the array is as follows.

Output for Theano.
full:
(2, 3, 3, 3)
valid:
(2, 3, 3, 3)
Output for scipy.
full:
(4, 1, 3, 3)
valid:

theano.tensor.nnet.conv()Is shape 1,The second is the number of images and the number of filters, respectively, and the restfullThenM+(m-1)、validThenM-(m-1)It has becomescipy.signal.fftconvolve()Is for all axesfullThenM+(m-1)You can see that it is.

#stride

[PYTHON] Theano's convolution

Convolution between 2D arrays