In neural networks, nonlinear functions are used as activation functions, but I will explain why they are not linear functions.
A function whose output is a constant multiple of its input, that is, a straight line function.
Like this.
It's a function of non-linear, jerky or crooked lines.
Like this.
In neural networks, you need to use a non-linear function as the activation function. If you use a linear function, the output will be a constant multiple (straight line) of the input. This makes it meaningless to deepen the layer.
Consider one example. Example) A three-layer network using the linear function $ h (x) = ax $ as the activation function
The output $ y $ is $ y (x) = h (h (h (x))) $, which is a one-time $ y (x) = kx $ (but $ k = a ^ 3 $) It can be expressed by multiplication. In other words, it can be expressed by a network without hidden layers. There is no point in making it multi-layered.
That's why neural networks use non-linear functions that aren't linear.
This article is recommended. Decompose "complexity" into many "simple" -forward propagation is a repetition of "linear function" and "simple nonlinearity"
Recommended Posts