Previously, I created an article Text classification by Recurrent Convolutional NN and wrote the code to classify text by R-CNN which is not an image.
At this time, I wrote that "x-axis fixed convolution is not possible with covolution_2d", but in reality, the same convolution method as the processing for images (applying a filter while moving in the x and y directions) But the performance seems to come out. There was a person who actually implemented Chainer by that method.
It is possible to create a large two-dimensional vector by connecting word embedding vectors as they are in word order, and use it to learn a classifier, but due to the nature of text, its size depends on the length of the sentence. Therefore, in the previous implementation, the maximum length of all sentences is measured once, fixed to that size (0 padding if necessary), and processed.
This is one way to do this, but by using Spatial Pyramid Pooling for the pooling layer, you can turn any length input into a fixed length vector. By using this, you can create a model that accepts variable length input. Since it is implemented as a spatial_pyramid_pooling_2d function in Chainer, replace max_pooling_2d with this and combine the number of units in the middle FC layer with the output of SPP to complete.
def __call__(self, x):
h1 = F.spatial_pyramid_pooling_2d(F.relu(self.conv1(x)), 2, F.MaxPooling2D) # max_pooling_Replace with 2d
h2 = F.dropout(F.relu(self.l1(h1)))
y = self.l2(h2)
return y
The test accuracy is about the same as the original (around 75%).