I will explain the types of activation functions that appear in neural networks and what kind of functions they are.
It is a function that converts the sum of input signals into an output signal. The activation function is responsible for determining how the sum of input signals is activated and how it fires.
Expressed as an expression, it looks like this.
It looks like this in the figure.
a = x_1w_1 + x_2w_2 + b \\
y = h(a)
It is a function that switches the output at the threshold value. It is also called "step function".
Since the perceptron takes a binary value of firing (1) or not firing (0), it can be said that "perceptron uses a step function as the activation function". Normally, neural networks use another function that is not a step function as the activation function.
def step_function(x):
if x > 0:
return 1
else:
return 0
If the input is greater than 0, it will return 1, and if it is less than 0, it will return 0. I think that neural networks use Numpy arrays, so I will make them correspond to Numpy arrays.
def step_function(x):
y = x > 0
return y.astype(np.int)
A description of the code. An inequality sign operation on a Numpy array will generate a boolean array.
>>> x = np.array([1.0, -1.0, 2.0])
>>> y = x > 0
>>> y
>>> array([ True, False, True])
I am converting it to an int type.
>>> y.astype(np.int)
>>> array([1, 0, 1])
The graph looks like this.
def sigmoid(x):
return 1 / (1 + np.exp(-x))
When you perform a numerical operation on a Numpy array and a scalar value, each element of the Numpy array and the scalar value are calculated, and the result of the operation is output as a Numpy array.
The graph looks like this.
I think that sigmoids should be recognized as smooth step functions. Smoother is more convenient.
Meaning and simple properties of sigmoid function
h(x) = \left\{
\begin{array}{ll}
x & (x \gt 0) \\
0 & (x \leq 0)
\end{array} \right.
It is a function that outputs the input value as it is if the input exceeds 0, and outputs 0 if it is 0 or less. The reading is "Rectified function". The official name is "Rectified Linear Unit", which is also known as the ramp function.
def relu(x):
return np.maximum(0, x)
maximum (): Compares each element of 0 and x and returns the larger one
The graph looks like this.
It is often used as the activation function of the output layer. Since it is itself / whole, it can be regarded as a probability. You can see which is the most plausible in other classifications.
def softmax(a):
exp_a = np.exp(a)
sum_exp_a = np.sum(exp_a)
y = exp_a / sim_exp_a
Be careful here! The exponential function grows explosively. Like this. → Overflow occurred
Subtract the maximum value in the input signal! Because the softmax function has the property that the result does not change even if some constants are added or subtracted.
def softmax(a):
c = np.max(a) #Maximum value in the input signal
exp_a = np.exp(a - c)
sum_exp_a = np.sum(exp_a)
y = exp_a / sum_exp_a
return y
Compare the y-axis.
It is often used as the activation function of the output layer of regression. It is a function that outputs the input as it is.
Neural networks can be used for both regression and classification problems, but the activation function is used differently depending on which problem, so different activation functions may be used for the output layer and the intermediate layer.
Recommended Posts