Introduction

PyTorch has implemented a new neural network, $ \ Pi $ -Net, proposed in the following paper adopted for CVPR2020.

Chrysos, Grigorios G., et al. "\Pi- nets: Deep Polynomial Neural Networks." arXiv preprint arXiv:2003.03828 (2020).

The entire code used for learning can be found on GitHub.

What is Π-Net?

In $ \ Pi $ -Net, the network is branched in the middle, and ** multiplication is performed at the part where it joins again **. This represents the output as an input polynomial.

In an ordinary neural network, non-linearity is given by applying an activation function such as ReLU or Sigmoid to the output of each layer. Without the activation function, no matter how many layers you increase the network, you can only output linearly to the input, which is meaningless.

However, in $ \ Pi $ -Net, the output of the intermediate layer is multiplied to give the network non-linearity, so even if the activation function is not used, ** the number of layers is high. You can gain abilities.

Several network structures have been proposed in the paper, but this time we implemented them based on one of them, the following structure.

(Quoted from the paper)

There is Skip-connection and it has a structure like ResNet, but the part where it joins is not addition but multiplication (Hadamard product). Since the output of the previous block is squared in each block, by stacking $ N $ blocks, it becomes a polynomial of order $ 2 ^ N $, and the expressive power of the network increases exponentially.

Implementation

I made the following model by stacking 5 blocks in the above figure. Therefore, the output of the network is represented by a polynomial of the order $ 2 ^ 5 = 32 $. Notice that it doesn't use the activation function at all.

`model`


class PolyNet(nn.Module):
    def __init__(self, in_channels=1, n_classes=10):
        super().__init__()
        N = 16
        kwds1 = {"kernel_size": 4, "stride": 2, "padding": 1}
        kwds2 = {"kernel_size": 2, "stride": 1, "padding": 0}
        kwds3 = {"kernel_size": 3, "stride": 1, "padding": 1}
        self.conv11 = nn.Conv2d(in_channels, N, **kwds3)
        self.conv12 = nn.Conv2d(in_channels, N, **kwds3)
        self.conv21 = nn.Conv2d(N, N * 2, **kwds1)
        self.conv22 = nn.Conv2d(N, N * 2, **kwds1)
        self.conv31 = nn.Conv2d(N * 2, N * 4, **kwds1)
        self.conv32 = nn.Conv2d(N * 2, N * 4, **kwds1)
        self.conv41 = nn.Conv2d(N * 4, N * 8, **kwds2)
        self.conv42 = nn.Conv2d(N * 4, N * 8, **kwds2)
        self.conv51 = nn.Conv2d(N * 8, N * 16, **kwds1)
        self.conv52 = nn.Conv2d(N * 8, N * 16, **kwds1)

        self.fc = nn.Linear(N * 16 * 3 * 3, n_classes)

    def forward(self, x):
        h = self.conv11(x) * self.conv12(x)
        h = self.conv21(h) * self.conv22(h)
        h = self.conv31(h) * self.conv32(h)
        h = self.conv41(h) * self.conv42(h)
        h = self.conv51(h) * self.conv52(h)
        h = self.fc(h.flatten(start_dim=1))

        return h

result

I learned the classification of MNIST and CIFAR-10.

MNIST Accuracy Screenshot from 2020-03-31 23-22-56.png

Loss Screenshot from 2020-03-31 23-22-44.png

Test accuracy of about 99%!

CIFAR-10 Accuracy Screenshot from 2020-03-31 23-19-35.png

Loss Screenshot from 2020-03-31 23-20-46.png

The test is about 70% accurate, but you're overfitting ...

in conclusion

Since the output is a polynomial of input, we could learn without using the activation function.

As mentioned above, stacking blocks exponentially improves expressiveness, It is known that even ordinary neural networks improve their expressiveness exponentially with respect to the number of layers [^ 1], so honestly, I didn't really understand the advantages of $ \ Pi $ -Net ...

[PYTHON] I tried a neural network Π-Net that does not require an activation function