Introduction

I will explain the types of activation functions that appear in neural networks and what kind of functions they are.

What is the activation function?

It is a function that converts the sum of input signals into an output signal. The activation function is responsible for determining how the sum of input signals is activated and how it fires. Expressed as an expression, it looks like this. $ y = h(\sum_{i=1}^{n}x_iw_i + b) $ $ h () $: Activation function, $ \ sum_ {i = 1} ^ {n} x_iw_i + b $: Input signal, $ y $: Output signal

It looks like this in the figure.

a = x_1w_1 + x_2w_2 + b \\
y = h(a)

Step function

It is a function that switches the output at the threshold value. It is also called "step function".

Since the perceptron takes a binary value of firing (1) or not firing (0), it can be said that "perceptron uses a step function as the activation function". Normally, neural networks use another function that is not a step function as the activation function.

def step_function(x):
    if x > 0:
        return 1
    else:
        return 0

If the input is greater than 0, it will return 1, and if it is less than 0, it will return 0. I think that neural networks use Numpy arrays, so I will make them correspond to Numpy arrays.

def step_function(x):
    y = x > 0
    return y.astype(np.int)

A description of the code. An inequality sign operation on a Numpy array will generate a boolean array.

>>> x = np.array([1.0, -1.0, 2.0])
>>> y = x > 0
>>> y
>>> array([ True, False,  True])

I am converting it to an int type.

>>> y.astype(np.int)
>>> array([1, 0, 1])

The graph looks like this.

Sigmoid function

h(x) = \frac{1}{1-\exp(-x)}

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

When you perform a numerical operation on a Numpy array and a scalar value, each element of the Numpy array and the scalar value are calculated, and the result of the operation is output as a Numpy array.

The graph looks like this.

I think that sigmoids should be recognized as smooth step functions. Smoother is more convenient.

Meaning and simple properties of sigmoid function

ReLU function

h(x) = \left\{
\begin{array}{ll}
x & (x \gt 0) \\
0 & (x \leq 0)
\end{array} \right.

It is a function that outputs the input value as it is if the input exceeds 0, and outputs 0 if it is 0 or less. The reading is "Rectified function". The official name is "Rectified Linear Unit", which is also known as the ramp function.


def relu(x):
    return np.maximum(0, x)

maximum (): Compares each element of 0 and x and returns the larger one

The graph looks like this.

softmax function

y_k = \frac{\exp(a_k)}{\sum_{i=1}^{n}\exp(a_i)}

It is often used as the activation function of the output layer. Since it is itself / whole, it can be regarded as a probability. You can see which is the most plausible in other classifications.

def softmax(a):
    exp_a = np.exp(a)
    sum_exp_a = np.sum(exp_a)
    y = exp_a / sim_exp_a

Be careful here! The exponential function grows explosively. Like this. → Overflow occurred

What to do?

Subtract the maximum value in the input signal! Because the softmax function has the property that the result does not change even if some constants are added or subtracted.

def softmax(a):
    c = np.max(a) #Maximum value in the input signal
    exp_a = np.exp(a - c)
    sum_exp_a = np.sum(exp_a)
    y = exp_a / sum_exp_a
    return y

Compare the y-axis.

Identity function

It is often used as the activation function of the output layer of regression. It is a function that outputs the input as it is.

in conclusion

Neural networks can be used for both regression and classification problems, but the activation function is used differently depending on which problem, so different activation functions may be used for the output layer and the intermediate layer.

[PYTHON] What is the activation function?

Introduction

What is the activation function?

Step function

Sigmoid function

ReLU function

softmax function

What to do?

Identity function

in conclusion