In a certain task, I wanted to increase the features in the Y direction while decreasing the features in the X direction. For example, I want to make an image of size (100, 100) to (50, 200) using conv / deconv. There are roughly two ways to solve this.
I would like to avoid the first method because it has a two-layer structure. Therefore, we examined and implemented a method of stretching and conving.
However, I couldn't think of a good implementation method and used functions.deconvolution_2d
.
I want to implement it smarter if possible.
If you use convolution, you can map to a smaller number of features while maintaining the position information.
x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
shape = chainer.links.Convolution2D(1, 1, ksize=(4, 1), stride=(2, 1), pad=(1, 0))(x).shape
# shape: (1, 1, 50, 100)
By using deconvolution, it is possible to map to a larger number of features while maintaining the position information.
x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
shape = chainer.links.Deconvolution2D(1, 1, ksize=(1, 4), stride=(1, 2), pad=(0, 1))(x).shape
# shape: (1, 1, 100, 200)
However, there is probably no layer that maps to a small number of features in one dimension and a large number of features in another dimension.
conv→deconv/deconv→conv This is the simplest implementation, but I would like to avoid it because it has a two-layer structure and the gradient is likely to disappear.
conv->deconv
x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.links.Convolution2D(1, 1, ksize=(4, 1), stride=(2, 1), pad=(1, 0))(x)
x = chainer.links.Deconvolution2D(1, 1, ksize=(1, 4), stride=(1, 2), pad=(0, 1))(x)
# x.shape: (1, 1, 50, 200)
deconv->conv
x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.links.Deconvolution2D(1, 1, ksize=(1, 4), stride=(1, 2), pad=(0, 1))(x)
x = chainer.links.Convolution2D(1, 1, ksize=(4, 1), stride=(2, 1), pad=(1, 0))(x)
# x.shape: (1, 1, 50, 200)
I thought about two things. First of all, 1.
After stretching it using functions.unpooling_2d
, make it smaller with conv.
unpooling->conv
x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.functions.unpooling_2d(x, ksize=(1, 2))
x = chainer.links.Convolution2D(1, 1, ksize=(4, 4), stride=(2, 1), pad=(1, 2))(x)
# x.shape: (1, 1, 50, 200)
Then that 2.
After stretching using functions.deconvolution_2d
, reduce it with conv.
It feels like making a mask like 1010101010 ... and stretching it with deconv.
upsample->conv
x = numpy.random.rand(1, 1, 100, 100).astype(numpy.float32)
x = chainer.functions.deconvolution_2d(x, W=numpy.array([0, 1, 0], numpy.float32).reshape(1, 1, 1, 3), stride=(1, 2))
x = chainer.links.Convolution2D(1, 1, ksize=(4, 4), stride=(2, 1), pad=(1, 1))(x)
# x.shape: (1, 1, 50, 200)
Which one is better?
In the first place, I intend to apply it when performing 3D conv using links.ConvolutionND
instead of 2D, but I noticed that there is no functions.unpooling_nd
. What should I do.
Recommended Posts