[PYTHON] Conv in x direction and deconv in y direction with chainer

Overview

In a certain task, I wanted to increase the features in the Y direction while decreasing the features in the X direction. For example, I want to make an image of size (100, 100) to (50, 200) using conv / deconv. There are roughly two ways to solve this.

conv → deconv or deconv → conv
Stretch → conv

I would like to avoid the first method because it has a two-layer structure. Therefore, we examined and implemented a method of stretching and conving.

However, I couldn't think of a good implementation method and used functions.deconvolution_2d. I want to implement it smarter if possible.

background

If you use convolution, you can map to a smaller number of features while maintaining the position information.

x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
shape = chainer.links.Convolution2D(1, 1, ksize=(4, 1), stride=(2, 1), pad=(1, 0))(x).shape
# shape: (1, 1, 50, 100)

By using deconvolution, it is possible to map to a larger number of features while maintaining the position information.

x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
shape = chainer.links.Deconvolution2D(1, 1, ksize=(1, 4), stride=(1, 2), pad=(0, 1))(x).shape
# shape: (1, 1, 100, 200)

However, there is probably no layer that maps to a small number of features in one dimension and a large number of features in another dimension.

Method study

conv→deconv／deconv→conv This is the simplest implementation, but I would like to avoid it because it has a two-layer structure and the gradient is likely to disappear.

`conv->deconv`


x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.links.Convolution2D(1, 1, ksize=(4, 1), stride=(2, 1), pad=(1, 0))(x)
x = chainer.links.Deconvolution2D(1, 1, ksize=(1, 4), stride=(1, 2), pad=(0, 1))(x)
# x.shape: (1, 1, 50, 200)

`deconv->conv`


x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.links.Deconvolution2D(1, 1, ksize=(1, 4), stride=(1, 2), pad=(0, 1))(x)
x = chainer.links.Convolution2D(1, 1, ksize=(4, 1), stride=(2, 1), pad=(1, 0))(x)
# x.shape: (1, 1, 50, 200)

Stretch → conv

I thought about two things. First of all, 1. After stretching it using functions.unpooling_2d, make it smaller with conv.

`unpooling->conv`


x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.functions.unpooling_2d(x, ksize=(1, 2))
x = chainer.links.Convolution2D(1, 1, ksize=(4, 4), stride=(2, 1), pad=(1, 2))(x)
# x.shape: (1, 1, 50, 200)

Then that 2. After stretching using functions.deconvolution_2d, reduce it with conv. It feels like making a mask like 1010101010 ... and stretching it with deconv.

`upsample->conv`


x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.functions.deconvolution_2d(x, W=numpy.array([0, 1, 0], numpy.float32).reshape(1, 1, 1, 3), stride=(1, 2))
x = chainer.links.Convolution2D(1, 1, ksize=(4, 4), stride=(2, 1), pad=(1, 1))(x)
# x.shape: (1, 1, 50, 200)

Consideration

Which one is better?

conv->deconv
It seems that it is not good because the feature amount is very small when conving
Low memory usage
deconv->conv
I feel that this two-layer structure is equivalent to ʻupsample-> conv` ...
Maybe if you put the activation function after deconv
unpooling->conv
Similar to the method called upconv * Deconvolution and Checkerboard Artifacts
This seems to be the safest
upsample->conv
I couldn't think of a good implementation method
It seems that memory can be reduced if a dedicated layer can be defined.

from now on

In the first place, I intend to apply it when performing 3D conv using links.ConvolutionND instead of 2D, but I noticed that there is no functions.unpooling_nd. What should I do.