[PYTHON] Conv in x direction and deconv in y direction with chainer

Overview

In a certain task, I wanted to increase the features in the Y direction while decreasing the features in the X direction. For example, I want to make an image of size (100, 100) to (50, 200) using conv / deconv. There are roughly two ways to solve this.

  1. conv → deconv or deconv → conv
  2. Stretch → conv

I would like to avoid the first method because it has a two-layer structure. Therefore, we examined and implemented a method of stretching and conving.

However, I couldn't think of a good implementation method and used functions.deconvolution_2d. I want to implement it smarter if possible.

background

If you use convolution, you can map to a smaller number of features while maintaining the position information.

x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
shape = chainer.links.Convolution2D(1, 1, ksize=(4, 1), stride=(2, 1), pad=(1, 0))(x).shape
# shape: (1, 1, 50, 100)

By using deconvolution, it is possible to map to a larger number of features while maintaining the position information.

x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
shape = chainer.links.Deconvolution2D(1, 1, ksize=(1, 4), stride=(1, 2), pad=(0, 1))(x).shape
# shape: (1, 1, 100, 200)

However, there is probably no layer that maps to a small number of features in one dimension and a large number of features in another dimension.

Method study

conv→deconv/deconv→conv This is the simplest implementation, but I would like to avoid it because it has a two-layer structure and the gradient is likely to disappear.

conv->deconv


x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.links.Convolution2D(1, 1, ksize=(4, 1), stride=(2, 1), pad=(1, 0))(x)
x = chainer.links.Deconvolution2D(1, 1, ksize=(1, 4), stride=(1, 2), pad=(0, 1))(x)
# x.shape: (1, 1, 50, 200)

deconv->conv


x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.links.Deconvolution2D(1, 1, ksize=(1, 4), stride=(1, 2), pad=(0, 1))(x)
x = chainer.links.Convolution2D(1, 1, ksize=(4, 1), stride=(2, 1), pad=(1, 0))(x)
# x.shape: (1, 1, 50, 200)

Stretch → conv

I thought about two things. First of all, 1. After stretching it using functions.unpooling_2d, make it smaller with conv.

unpooling->conv


x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.functions.unpooling_2d(x, ksize=(1, 2))
x = chainer.links.Convolution2D(1, 1, ksize=(4, 4), stride=(2, 1), pad=(1, 2))(x)
# x.shape: (1, 1, 50, 200)

Then that 2. After stretching using functions.deconvolution_2d, reduce it with conv. It feels like making a mask like 1010101010 ... and stretching it with deconv.

upsample->conv


x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.functions.deconvolution_2d(x, W=numpy.array([0, 1, 0], numpy.float32).reshape(1, 1, 1, 3), stride=(1, 2))
x = chainer.links.Convolution2D(1, 1, ksize=(4, 4), stride=(2, 1), pad=(1, 1))(x)
# x.shape: (1, 1, 50, 200)

Consideration

Which one is better?

from now on

In the first place, I intend to apply it when performing 3D conv using links.ConvolutionND instead of 2D, but I noticed that there is no functions.unpooling_nd. What should I do.

Recommended Posts

Conv in x direction and deconv in y direction with chainer
Install Python 2.7.9 and Python 3.4.x with pip.
[Perfume x STAR WARS] Style conversion with Chainer starting in 1 minute
Put OpenCV in OS X with Homebrew and input / output video with python
Learn data distributed with TensorFlow Y = 2X
Dealing with "years and months" in Python
Connect Scratch X and Digispark with a bottle
Load caffe model with Chainer and classify images
Multivariate LSTM and data preprocessing in TensorFlow 2.x
Y / n processing in bash, python and Go
Put Scipy + Matplotlib in Ubuntu on Vagrant and display the graph with X11 Forwarding